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(57) Abstract 

Disclosed are a family of synthetic proteins having binding affinity for a preselected antigen, and multifunctional 
proteins having such affinity. The proteins are characterized by one or more sequences of amino adds constituting a re- 
gion which behaves as a btosynlhctic antibody binding site (BABS> The sites comprise V H -V L or V L -V H -tike single chains 
wherein the V H and V L -Uke sequences are attached by a polypeptide linker, or individual V H or V L *like domains. The 
binding domains comprise linked CDR and FR regions, which nimy tc derived from separate immunoglobulins. The pro- 
teins may also include other polypeptide sequences which function, cg^ as an enzyme, toxin, binding site, or site for at- 
tachment to an immobilization media or radioactive atom. Methods are disclosed for producing the proteins, for designing 
BABS having any specificity that can be elicited by in nV© generation of antibody, for producing analogs thereof, and for 
producing multifunctional synthetic proteins which are self-targeted by virtue of their binding site region. 
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TARGETED MULTIFUNCTIONAL PROTEINS 

The United States Government has rights in 
this -application pursuapt to small business 
innovation research grant numbers SSS-4 R43 
CA39870-01 and SSS-4 2 R44 CA39870-02. 

Referenc e to Related Applications 

This application is a continuation-in-part 
of copending U.S. application serial number 052,800 
filed May 21, 1987, the disclosure of which is 
incorporated herein by reference. 

Backgrou nd of the Invention 

This invention relates to novel compositions 
of matter, hereinafter called targeted 
multifunctional proteins, useful, for example, in 
specific binding assays, affinity purification, 
biocatalysis, drug targeting, imaging, immunological 
treatment of various oncogenic and infectious 
diseases, and in other contexts. More specifically, 
this invention relates to biosynthetic proteins 
expressed from recombinant DMA as a single 
polypeptide chain comprising plural regions, one of 
which has a structure similar to an antibody binding 
site, and an affinity for a preselected antigenic 
determinant, and another of 'which has a separate 
function, and may be biologically active, designed to 
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bind to ions, or designed to facilitate 
immobilization of the protein. This invention also 
relates to the binding proteins per se, and methods 
for their construction. 

There are five classes of human antibodies. 
Each has the same basic structure (see Figure 1), or 
multiple thereof, consisting of two identical 
polypeptides called heavy (H) chains (molecularly 
weight approximately 50,000 d) and two identical 
light (L) chains (molecular weight approximately 
25,000 d). Each of the five antibody classes has a 
similar set of light chains and a distinct set of 
heavy chains. A light chain is composed of one 
variable and one constant domain, while a. heavy chain 
is composed of one variable and three or more 
constant domains. The combined variable domains of a 
paired light and heavy chain are known as the Fv 
region, or simply -Fv-. The Fv determines the 
specificity of the immunoglobulin, the constant 
regions have other functions. 

Amino acid sequence data indicate that each 
variable domain comprises three hypervariable regions 
or loops, sometimes called complementarity 
determining regions or "CDRs" flanked by four 
relatively conserved framework regions or "FRs- 
(Kabat et. al., Seau*nr*>« of Prnf ft ^ nf . nf 
Immunologica l I nterest [U.S. Department of Health and 
Human Services, third edition, 1983, fourth edition, 
19873). The hypervariable regions have been assumed 
to be responsible for the binding specificity of 
individual antibodies and to' account for the 
diversity of binding of antibodies as a protein class. 



Monoclonal antibodies have been used both as 
diagnostic and therapeutic agents. They are 
routinely produced according to established 
procedures by hybridomas generated by fusion of mouse 
lymphoid cells with an appropriate mouse myeloma cell 
line. 

The literature contains a host of references 
to the concept of targeting bioactlve substances such 
as drugs, toxins, and enzymes to specific points in 
the body to destroy or locate malignant cells or to 
induce a localized drug or enzymatic effect. It has 
been proposed to achieve this effect by conjugating 
the bioactive substance to monoclonal antibodies 
(see, e.g., vogel, Immunocpnluqates, Antibody 

Conjugat es in Radioimaalno and Therapy nf Panrer. 
1987, N.Y., Oxford University Press; and Ghose et al. 
(1978) J. Natl. Cancer Inst, il: 657-676, ). However, 
non-human antibodies induce an immune response when 
injected into humans. Human monoclonal antibodies 
may alleviate this problem, but they are difficult to 
produce by cell fusion techniques since, among other 
problems, human hybridomas are notably unstable, and 
removal of immunized spleen cells from humans Is not 
feasible. 

Chimeric antibodies composed of human and 
non-human amino acid sequences potentially havj 
improved therapeutic value as they presumably would 
elicit less circulating human antibody against the 
non-human immunoglobulin sequences. Accordingly, 
hybrid antibody molecules have been proposed which 
consist of amino acid sequences from different 
mammalian sources. The chimeric antibodies designed 
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thus far comprise variable regions from one mammalian 
source , and constant regions from human or another 
mammalian source (Morrison et al. (1984) Proc Natl. 
Acad. Sci. U.S.A. , fll: 5851-6855; Neuberger ot al. 
(1984) Nature 312 :604-608; Sahagan et al. (1986) J. 
Immunol. XXL- 1066-1074; EPO application nos. 
84302368.0, Genentech; 85102665.8, Research 
Development Corporation of Japan; 85305604.2, 
Stanford; P.C.T. application no. PCT/GB85/00392 , 
Celltech Limited). 

It has been reported that binding function 
is localized to the variable domains of the antibody 
molecule located at the amino terminal end of both 
the heavy and light chains. The variable regions 
remain noncovalently associated (as V H V^ dimers, 
termed Fv regions) even aft-.*r proteolytic cleavage 
from the native antibody molecule, and retain much of 
their antigen recognition and binding capabilities 
(see, for example, Inbar et al., Proc. Natl. Acad. 
Sci. U.S.A. (1972) '£1:2659-2662; Hochraan et. al. 
(1973) Biochem. 12:1130-1135; and (1976) Biochem. 
15:2706-2710; Sharon and Givol (1976) Biochem. 
12:1591-1594; Rosenblatt and Haber (1978) Biochem. 
12:3877-3882; Ehrlich et al. (1980) Biochem. 
12,:4091-40996). Methods of manufacturing two-chain 
Fv substantially free of constant region using 
recombinant DNA techniques are disclosed in U.S. 
4,642,334 and corresponding published specification 
EP 088,994. 
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Summary of the Invention 

In one aspect the invention provides a 
single chain multifunctional biosynthetic protein 
expressed from a single gene derived by recombinant 
DNA techniques. The protein comprises a biosynthetic 
antibody binding site (BABS) comprising at least one 
protein domain capable of binding to a preselected 
antigenic determinant. The amino acid sequence of 
the domain is homologous to at least a portion of the 
sequence of a variable region of an immunoglobulin 
molecule capable of binding the preselected antigenic 
determinant* Peptide bonded to the binding site is a 
polypeptide consisting of. an effector protein having 
a conformation suitable for biological activity in a 
mammal, an amino acid sequence capable of 
sequestering ions, or an amino acid sequence capable 
of selective binding to a solid support. 

In another aspect, the invention prbvides 
biosynthetic binding site protein comprising a single 
polypeptide chain defining two polypeptide domains 
connected by a polypeptide linker. The amino acid 
sequence of each of the domains comprises a set of 
complementarity determining regions (CDRs) interposed 
between a set of framework regions (FRs), each of 
which is respectively homologous with at least a 
portion of the CDRs and FRS from an immunoglobulin 
molecule. At least one of the domains comprises a 
set of CDR amino acid sequences and a set of FR amino 
acid sequences at least partly homologous to 
different immunoglobulins. The two polypeptide 
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f domains together define a hybrid synthetic binding 
site having specificity for a preselected antigen, 
determined by the selected CDRs. 

In still another aspect, the invention 
^ provides biosynthetic binding protein comprising a 
single polypeptide chain defining two domains 
connected by a polypeptide linker. The amino acid 
sequence of each of the domains comprises a set of 
CDRs interposed between a set of FRs, each of which 

izis respectively homologous with at least a portion of 
the CDRs and FRs from an immunoglobulin molecule. 
The linker comprises plural, peptide-bonded amino 
acids defining a polypeptide of a length sufficient 
to span the distance between the C terminal end of 

if one of the domains and N terminal end of the other 
when the binding protein assumes a conformation 
suitable for binding. The linker comprises 
hydrophilic amino acids which together preferably 
constitute a hydrophilic sequence* Linkers which 

fc, assume an unstructured polypeptide configuration in 
aqueous solution work well. The binding protein is 
capable of binding to a preselected antigenic site, 
determined by the collective tertiary structure of 
the sets of CDRs held in proper conformation by the 

A'sets of FRs. Preferably, th* binding protein has a 

specificity at least substantially identical to the 

binding specificity of the immunoglobulin molecule 

used as a template for the design of the CDR 

regions. Such structures can have a binding affinity 
a l fl —1 

>> of at least 10 , M , and preferably 10 M . 

In preferred aspects, the FRs of the binding 

protein are homologous to at least a portion of the 

FRs from a human immunoglobulin, the linker spans at 
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least about 40 angstroms; a polypeptide spacer is 
incorporated in the multifunctional protein between 
the binding site and the second polypeptide; and the 
binding protein has an affinity for the preselected 
antigenic determinant no less than two orders of 
magnitude less than the binding affinity of the 
immunoglobulin molecule used as a template for the 
CDR regions of the binding protein. The preferred 
linkers and spacers are cysteine-f ree. The linker 
preferably comprises amino acids having unreactive 
side groups, e.g., alanine and glycine. Linkers and 
spacers can be made by combining plural consecutive 
copies of an amino acid sequence, e.g., (Gly 4 
Ser) 3 . The invention also provides DNAs encoding 
these proteins and host cells harboring and capable 
of expressing these DNAs. 

As used herein, the phrase biosynthetic 
antibody binding site or BABS means synthetic 
proteins expressed from DNA derived by recombinant 
techniques. BABS comprise biosynthetically produced 
sequences of amino acids defining polypeptides 
designed to bind with a preselected antigenic 
material. The structure of these synthetic 
polypeptides is unlike that of naturally occurring 
antibodies, fragments thereof, e.g., Fv, or known 
synthetic polypeptides or "chimeric antibodies" in 
that the regions of the BABS responsible for 
specificity and affinity of binding, (analogous to 
native antibody variable regions) are linked by 
peptide bonds, expressed from a single DNA, and may 
themselves be chimeric, e.g., may comprise amino acid 
sequences homologous to portions of at least two 
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different antibody molecules. The BABS embodying the 
invention are biosynthetic in the sense that they are 
synthesized in a cellular host made to express a 
synthetic DNA, that is, a recombinant DNA made by 
ligation of plural, chemically synthesized 
oligonucleotides, or by ligation of fragments of DNA 
derived from the genome of a hybridoma, mature B cell 
clone, or a cDNA library derived from such natural 
sources. The proteins of the invention are properly 
characterized as ^binding sites" in that these 
synthetic molecules are designed to have specific 
affinity for a preselected antigenic determinant. 
The polypeptides of the invention comprise structures 
patterned after regions of native antibodies known to 
be responsible for antigen recognition. 

Accordingly, it is an object of the 
invention to provide novel multifunctional proteins 
comprising one or more effector proteins and one or 
more biosynthetic antibody binding sites, and to 
provide DNA sequences which encode the proteins. 
Another object is to provide a generalized method for 
producing biosynthetic antibody binding site 
polypeptides of any desired specificity. 
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Brief Description of the Drawing 

The foregoing and other objects of this 
invention, the various features thereof, as well as 
the invention itself, may be more fully understood 
from the following description, when read together 
with the accompanying drawings. 

Figure 1A is a schematic representation of 
an intact IgG antibody molecule containing two light 
chains, each consisting of one variable and one 
constant domain, and two heavy chains, each 
consisting of one variable and three constant 
domains. Figure IB is a schematic drawing of the 
structure of Fv proteins (and DNA encoding them) 
illustrating V H and V L domains, each of which 
comprises four framework (FR) regions and three 
complementarity determining (CDR) regions. Boundaries 
of CDR3 are indicated, by way of example, for 
monoclonal 26-10, a well known and characterized 
murine monoclonal specific for digozin. 

Figure 2A-2E are schematic representations 
of some of the classes of reagents constructed in 
accordance with the invention, each of which 
comprises a biosynthetic antibody binding site. 

Figure 3 discloses five amino acid sequences 
(heavy chains) in single letter code lined up 
vertically to facilitate understanding of the 
invention. Sequence 1 is the known native sequence 
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of V H from murine monoclonal glp-4 
(anti-lysozyroe) . Sequence 2 is the known native 
sequence of from murine monoclonal 26-10 
(anti-digoxin) . Sequence 3 is a BABS comprising the 
FRs from 26-10 V H and the CDRs from glp-4 V^. 
The CDRs are identified in lower case letters; 
restriction sites in the DNA used to produce chimeric 
sequence 3 are also identified. Sequence 4 is the 
known native sequence of V~ H from human myeloma 
antibody NEWM. Sequence 5 is a BABS comprising the 
FRs from NEWM V* H and the CDRs from glp-4 V~ H , 
i.e., illustrates a 'humanized" binding site having a 
human framework but an affinity for lysozyme similar 
to murine glp-4. 

Figures 4A-4F are the synthetic nucleic acid 
sequences and encoded amino acid sequences of (4A) 
the heavy chain variable domain of murine 
anti-digoxin monoclonal 26-10; <4B) the light chain 
variable domain of murine anti-digozin monoclonal 
26-10; <4C) a heavy chain variable domain of a BABS 
comprising CDRs of glp-4 and FRs of 26-10; (4D) a 
light chain variable region of the same BABS; (4E) a 
heavy chain variable region of a BABS comprising CDRs 
of glp-4 and FRs of NEWM; and (4F) a light chain 
variable region comprising CDRs of glp-4 and FRs of 
NEWM. Delineated are FRs, CDRs, and restriction 
sites for endohuclease digestion, most of which were 
introduced during design of the DNA. 
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Figure 5 is the nucleic acid and encoded 
amino acid sequence of a host DNA (V^) designed to 
facilitate insertion of CDRs of choice. The DNA was 
designed to have unique 6-base sites directly 
flanking the CDRs so that relatively small 
oligonucleotides defining portions of CDRs can be 
readily inserted, and to have other sites to 
facilitate manipulation of the DNA to optimise 
binding properties in a given construct. The 
framework regions of the molecule correspond to 
murine FRs (Figure 4A) . 

Figures 6A and 6B are multifunctional 
proteins (and DNA encoding them) comprising a single 
chain SABS with the specificity of murine monoclonal 
26-10, linked through a spacer to the FB fragment of 
protein A, here fused as a leader, and constituting a 
binding site for Fc. The spacer comprises the 11 
C-terminal amino acids of the FB followed by Asp-Pro 
(a dilute acid cleavage site). The single chain BABS 
comprises sequences mimicking the and (6A) 
and the and V H (6B) of murine monoclonal 
26-10. The V L in construct 6A is altered at 
residue 4 where valine replaces methionine present in 
the parent 26-10 sequence. These constructs contain 
binding sites for both Fc and digoxin. Their 
structure may be summarized ar; 

(6A) FB-Asp-Pro-V H -(Gly 4 -Ser) 3 -V L , 

and 

(6B) FB-Asp-Pro-v L -(Gly 4 -Ser) 3 -V H , 
where (Gly.-Ser), is a polypeptide linker. 
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In Figures 4A-4E and 6A and 6B, the amino 
acid sequence of the expression products start after 
the GAATTC sequences* which codes for an EcoRI splice 
site, translated as Glu-Phe on the drawings. 

Figure 7A is a graph of ^percent of maximum 
counts bound of radioiodinated digoxin versus 
concentration of binding protein adsorbed to the 
plate comparing the binding of native 26-10 (curve 1) 
and the construct of Figure 6A and Figure 2B 
renatured using two different procedures (curves 2 
and 3). Figure 7B is a graph demonstrating the 
bifunctionality of the FB-(26-10) BABS adhered to 
microtiter plates through the specific binding of the 
binding site to the digoxin-BSA coat on the plate. 
Figure 7B shows the percent inhibition of 
125 I-rabbit-IgG binding to the FB domain of the FB 
BABS by the addition of IgG, protein A, FB, murine 
IgG2a, and murine IgGl. 

Figure 8 is a schematic representation of a 
model assembled DNA sequence encoding a 
multifunctional biosynthatic protein comprising a 
leader peptide (used to aid expression and thereafter 
cleaved), a binding site, a spacer, and an effector 
molecule attached as a trailer sequence. 

Figure 9A-9E are exemplary synthetic nucleic 
acid sequences and corresponding encoded amino acid 
sequences of binding sites of different 
specificities: (A) FRs from NEWM and CDRs from 26-10 
having the digoxin specificity of murine monoclonal 
26-10; (B) FRs from 26-10, and CDRs from G-loop-4 
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(glp-4) having lysozynte specificity; (C) FRs and CDRs 
from MOPC-315 having dinitrophenol (DNF) specificity; 
(D) FRs and CDRs from an anti-CEA monoclonal 
antibody; (E) FRs in both V H and V L and CDR^ 
and CDR^ in V H# and CDR^, CDR 2 , and CDR 3 in 
V L from an anti-CEA monoclonal antibody; CDR 2 in 
V H is a CDR 2 consensus sequence found in most 
immunoglobulin regions. 

Figure 10A is a schematic representation of 
the DNA and amino acid sequence of a leader peptide 
(MLE) protein with corresponding DNA sequence and 
some major restriction sites. Figure 10B shows the 
design of an expression plasmid used to express 
MLE- BAB S (26-10). During construction of the gene, 
fusion partners were joined at the EcoRl site that is 
shown as part of the leader sequence. The pBR322 
plasmid, opened at the unique Sspl and PstZ sites, 
was combined in a 3-part ligation with an Sspl to 
EcoRI fragment bearing the ±xe promoter and MLE 
leader and with an EcoRI to PstI fragment carrying 
the BABS gene. The resulting expression vector 
confers tetracycline resistance on positive 
transf ormants . 

Figure 11 is an SDS-polyacrylamide gel (15%) 
of the (26-10) BABS at progressive stages of 
purification. Lane 0 shows low molecular weight 
standards; lane 1 is the MLE -BABS fusion protein; 
lane 2 is an acid digest of this material; lane 3 is 
the pooled DE-52 chromatographed protein; lanes 4 and 
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5 are the same oubain-Sepharose pool of single chain 
BABS except that lane 4 protein is reduced and lane 5 
protein is unreduced. 

Figure 12 shows inhibition curves for 26-10 
BABS and 26-10 Fab species, and indicates the 
relative affinities of the antibody fragment for the 
indicated cardiac glycosides. 

Figures 13A and 13B are plots of digoxin 
binding curves. (A) shows 26-10 BABS binding 
isotherm and Sips plot (inset), and (B) shows 26-10 
Fab binding isotherm and Sips plot (inset). 

Figure 14 is a nucleic acid sequence and 
corresponding amino acid sequence of a modified FB 
dimer leader sequence and various restriction sites. 

Figure 15A-15H are nucleic acid sequences 
and corresponding amino acid sequences of 
biosynthetic multifunctional proteins including a 
single chain BABS and various biologically active 
protein trailers linked via a spacer sequence. Also 
indicated are various F.ndonuc lease digestion sites. 
The trailing sequences are (A) epidermal growth 
factor (EGF); (B) streptavidin; (C) tumor necrosis 
factor (TNF); (D) calmodulin; (E) platelet derived 
growth factor-beta (PDGF-beta); (F) ricin; and (G) 
inter leukin-2, and (H) an FB-FB dimer. 
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Description 

The invention will first be described in its 
broadest overall aspects with a more detailed 
description following. 

A class of novel biosynthetic, bi or 
multifunctional proteins has now been designed and 
engineered which comprise biosynthetic antibody 
binding sites, that is, "BABS" or biosynthetic 
polypeptides defining structure capable of selective 
antigen recognition and preferential antigen binding, 
and one or more peptide-bonded additional protein or 
polypeptide regions designed to have a preselected 
property. Examples of the second region include 
amino acid sequences designed to sequester ions, 
which makes the protein suitable for use as an 
imaging agent, and sequences designed to facilitate 
immobilization of the protein for use in affinity 
chromatography and solid phase immunoassay. Another 
example of the second region is a Moactive effector 
molecule, that is, a protein having a conformation 
suitable for biological activity, such as an enzyme, 
toxin, receptor, binding site, growth factor, cell 
differentiation factor, lymphokine, cytokine, 
hormone, or anti-metabolite. This invention features 
synthetic, multifunctional proteins comprising these 
regions peptide bonded to one or more biosynthetic 
antibody binding sites, synthetic, single chain 
proteins designed to bind preselected antigenic 
determinants with high affinity and specificity, 
constructs containing multiple binding sites linked 
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together to provide multipoint antigen binding and 

high net affinity and specificity, DNA encoding these 

proteins prepared by recombinant techniques, host 

cells harboring these DNAs, and methods for the 

production of these proteins and DNAs. 

The invention requires recombinant 

production of single chain binding sites having 

affinity and specificity for a predetermined 

antigenic determinant. This technology has been 

developed and is disclosed herein. In view of this 

disclosure, persons skilled in recombinant DNA 

technology, protein design, and protein chemistry can 

produce such sites which, when disposed in solution, 

have high binding constants (at least 10 6 , 
8 —1 

preferably 10 M ,) and excellent specificity. 

The design of the BABS is based on the 
observation that three subregions of the variable 
domain of each of the heavy and light chains of 
native immunoglobulin molecules collectively are 
responsible for antigen recognition and binding. 
Each of these subregions, called herein 
•complementarity determining regions" or CDRs, 
consists of one of the hypervariable regions or loops 
and of selected amino- acids or amino acid sequences 
disposed in the framework regions or FRs which flank 
that particular hypervariable region. It has now 
been discovered that FRs from diverse species are 
effective to maintin CDRs from diverse other species 
in proper conformation so as to achieve true 
immunochemical binding properties in a biosynthetic 
protein. It has also been discovered that 
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? biosynthetic domains mimicking the structure of the 
two chains of an immunoglobulin binding site may be 
connected by a polypeptide linker while closely 
approaching, retaining, and often improving their 

C collective binding properties. 

The binding site region of the 
multifunctional proteins comprises at least one, and 
preferably two domains, each of which has an amino 
acid sequence homologous to portions of the CDRs of 

)0 the variable domain of an immunoglobulin light or 
heavy chain, and other sequence homologous to the FRs 
of the variable domain of the same, or a second, 
different immunoglobulin light or heavy chain. The 
two domain binding site construct also includes a 
polypeptide linking the domains. Polypeptides so 
constructed bind a specific preselected antigen 
determined by the CDRs held in proper conformation by 
the FRs and the linker. Preferred structures have 
human FRs, i.e., mimic the amino acid sequence of at 

'la least a portion of the framework regions of a human 
immunoglobulin, and have linked domains which 
together comprise structure mimicking a v" H -V L or 
V_-V„ immunoglobulin two-chain binding site. CDR 

li H 

regions of a mammalian immunoglobulin, such as those. 
iS of mouse, rat, or human origin are preferred. In one 
preferred embodiment, the biosynthetic antibody 
binding site comprises FRs homologous with a portion 
of the FRs of a human immunoglobulin and CDRs 
homologous with CDRs from a mouse or rat 
immunoglobulin. This type of chimeric polypeptide 
displays the antigen binding specificity of the mouse 
or rat immunoglobulin, while its human framework 
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minimizes human immune reactions. In addition, the 

chimeric polypeptide may comprise other amino acid 

sequences. It may comprise, for example, a sequence 

homologous to a portion of the constant domain of an 

immunoglobulin, but preferably is free of constant 

regions (other than FRs). 

The binding site region(s) of the chimeric 

proteins are thus single chain composite polypeptides 

comprising a structure which in solution behaves like 

an antibody binding site.' The two domain, single 

chain composite polypeptide has a structure patterned 

after tandem V H and domains, but with the 

carboxyl terminal of one attached through a linking 

amino acid sequence to the amino terminal of the 

other. The linking amino acid sequence may or may 

not itself be antigenic or biologically active. It 

preferably spans a distance of at least about 40A, 

i.e., comprises at least about 14 amino acids, and 

comprises residues which together present a 

hydrophilic, relatively unstructured region. Linking 

amino acid sequences having little or no secondary 

structure work well. Optionally, one or a pair of 

unique amino acids or amino acid sequences 

recognizable by a site specific cleavage agent may be 

included in the linker. This permits the V H and 

V.-like domains to be separated after expression,. 
Ii 

or the linker to be excised after refolding of the 
binding site. 

Either the amino or carboxyl terminal ends 
(or both ends) of these chimeric, single chain 
binding sites are attached to an amino acid sequence 
which itself is bioactive or has some other function 
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to produce a bifunctional or multifunctional 
protein. For example, the synthetic binding site may 
include a leader and/or trails: sequence defining a 
polypeptide having enzymatic activity, independent 
affinity for an antigen different t .*om the antigen to 
which the binding site is directed, or having other 
functions such as to provide a convenient site of 
attachment for a radioactive ion, or to provide a 
residue designed to link chemically to a solid 
support. This fused, independently functional 
section of protein should be distinguished from fused 
leaders used simply to enhance expression in 
prokaryotic host cells or yeasts. The 
multifunctional proteins also should be distinguished 
from the "conjugates" disclosed in the prior art 
comprising antibodies which, after expression, are 
linked chemically to a second moiety. 

Often, a series of amino acids designed as a 
"spacer" is interposed between the active regions of 
the multifunctional protein. Use of such a spacer 
can promote independent refolding of the regions of 
the protein. The spacer also may include a specific 
sequence of amino vcids recognized by an 
endopeptidase, for example, endogenous to a target 
cell (e.g./ one having a surface protein recognized 
by the binding site) so that the bioactive effector 
protein is cleaved and released at the target. The 
second functional protein preferably is present as a 
trailer sequence, as trailers exhibit less of a 
tendency to interfere with the binding behavior of 
the BABS. 
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The therapeutic use of such "self-targeted" 
bipactive proteins offers a number of advantages over 
conjugates of immunoglobulin fragments or complete 
antibody molecules: they are stable, less 
immunogenic and have a lower molecular weight; they 
can penetrate body tissues more rapidly for purposes 
of imaging or drug delivery because of their smaller 
size; and they can facilitate accelerated clearance 
of targeted isotopes or drugs. Furthermore, because 
design of such structures at the DNA level as 
disclosed herein permits ready selection of 
bioproperties and specificities, an essentially 
limitless combination of binding sites and bioactive 
proteins is possible, each of which can be refined as 
disclosed herein to optimize independent activity at 
each region of the synthetic protein. The synthetic 
proteins can be expressed in procaryotes such as E*. 
coli . and thus are less costly to produce than 
immunoglobulins or fragments thereof which require 
expression in cultured animal cell lines. 

The invention thus provides a family of 
recombinant proteins expressed from a single piece of 
DNA, all of which have the capacity to bind 
specifically with a predetermined antigenic 
determinant. The preferred species of the proteins * 
comprise a second domain which functions 
independently of the binding region. In this aspect 
the invention provides an array of "self-targeted" 
proteins which have a bioactive function and which 
deliver that function to a locus determined by the 
binding site's specificity. It also provides 
biosynthetic binding proteins having attached 
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polypeptides suitable for attachment to 
immobilization matrices which may be used in affinity 
chromatography and solid phase immunoassay 
applications, cr suitable for attachment to ions, 
e.g., radioactive ions, which may be used for in vivo 
imaging. 

The successful design and manufacture of the 
proteins of the invention depends on the ability to 
produce biosynthetlc binding sites, and most 
preferably, sites comprising two domains mimicking 
the variable domains of immunoglobulin connected by a 
linker. 

As is now well known, Fv, the minimum 
antibody fragment which contains a complete antigen 
recognition and binding site, consists of a dlmer of 
one heavy and one' light chain variable domain in 
noncovalent association (Figure 1A) . It is in this 
configuration that the three complementarity 
determining regions of each variable domain interact 
to define an antigen binding site on the surface of 
the V u -V. dimer. Collectively, the six 
complementarity determining regions (see Figure IB) 
confer antigen binding specificity to the antibody. 
FRs flanking the CDRs have a tertiary structure which 
is essentially conserved in native immunoglobulins of 
species as diverse as human and mouse. These FRs 
serve to hold the CDRs in their appropriate 
orientation. The constant domains are not required 
for binding function, but may aid in stabilizing 
V„-V, interaction. Even a single variable domain 

H It 

(or half of an Fv comprising only three CDRs specific 
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for an antigen) has the ability to recognize and bind 
anti.gen, although at a lower affinity than an entire 
binding site (Painter et al. (1972) Biochem. 
11:1327-1337). 

This knowledge of the structure of 
immunoglobulin proteins has now been exploited to 
develop multifunctional fusion proteins comprising 
biosynthetic antibody binding sites and one or more 
other domains. 

The structure of these biosynthetic proteins 
in the region which impart the binding properties to 
the protein is analogous to the Fv region of a 
natural antibody. It comprises at least one, and 
preferably two domains consisting of amino acids 
defining V* H and v L -like polypeptide segments 
connected by a linker which together form the 
tertiary molecular structure responsible for affinity 
and specificity. Each domain comprises a set of 
amino acid sequences analogous to immunoglobulin CDRs 
held in appropriate conformation by a set of 
sequences analogous to the framework regions (FRs) of 
an Fv fragment of a natural antibody. 

The term CDR, as used herein, refers to 
amino acid sequences which together define the 
binding affinity and specificity of the natural Fv 
region of a native immunoglobulin bidding site, or a 
synthetic polypeptide which mimics this function. 
CDRs typically are not wholly homologous to 
hypervariable regions of natural Fvs, but rather also 
may include specific amino acids or amino acid 
sequences which flank the hypervariable region and 
have heretofore been considered framework not 
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directly determir.itive of complementarity. The terra 
FR, as used herein, refers to amino acid sequences 
flanking or interposed between CDRs. 

The CDR and FR polypeptide segments are 
designed based on sequence analysis of the Fv region 
of preexisting antibodies or of the DNA encoding 
them. In one embodiment, the amino acid sequences 
constituting the FR regions of the BABS are analogous 
to the FR sequences of a first preexisting antibody, 
for example, a human IgG. The amino acid sequences 
constituting the CDR regions are analogous to the 
sequences from a second, different preexisting 
antibody, for example, the CORs of a murine IgG. 
Alternatively, the CDRs and FRs from a single 
preexisting antibody from, e.g., an unstable' or hard 
to culture hybridoma, may be copied in their entirety. 

Practice of the invention enables the design 
and biosynthesis of various reagents, all of which 
are characterized by a region having affinity for a 
preselected antigenic determinant. The binding site 
and other regions of the biosynthetic protein are 
designed with the particular planned utility of the 
protein in mind. Thus, if the reagent is designed 
for intravascular use in mammals, the FR regions may 
comprise amino acids simiiar or identical to at least 
a portion of the framework region amino acids of 
antibodies native to that mammalian species. On the 
other hand, the amino acids comprising the CDRs may 
be analogous to a portion of the amino acids. from the 
hypervariable region (and certain flanking amino 
acids) of an antibody having a known affinity and 
specificity, e.g., a murine or rat monoclonal 
antibody. 
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Other sections of native immunoglobulin 
protein structure, e:g., C H and C^ t need not be 
present and normally are intentionally omitted from 
the biosynthetic proteins. However, the proteins of 
the invention normally comprise additional 
polypeptide or protein regions defining a bioactive 
region, e.g., a toxin or enzyme, or a site onto which 
a toxin or a remotely detectable substance can be 
attached. 

The invention thus can provide intact 
biosynthetic antibody binding sites analogous,, to 
V H -V L dimers, either non-covalently associated, 
disulfide bonded, or preferably linked by a 
polypeptide sequence to form a composite V H -V L or 
v L -v H polypeptide which may be essentially free 
of antibody constant region. The invention also 
provides proteins analogous to an independent V H or 
V L domain, or dimers thereof. Any of these 
proteins may be provided in a form linked to, for 
example, amino acids analogous or homologous to a 
bioactive molecule such as a hormone or toxin. 

Connecting the independently functional 
regions of the protein is a spacer comprising a short 
i # amino acid sequence whose function is to separate the 

functional regions so that they can independently 
assume their active tertiary conformation. The 
spacer can consist of an amino acid sequence present 
on the end of a functional protein which sequence is 
not itself required for its function, and/or specific 
sequences engineered into the protein at the DHA 
level. 
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The spacer generally may comprise between 5 
and 25 residues. Its optimal length may be 
determined using constructs of different spacer 
lengths varying, for example, by units of S amino 
acids. The specific amino acids in the spacer can 
vary. Cysteines should be avoided. Hydrophilic 
amino acids are preferred. The spacer sequence may 
mimic the sequence of a hinge region of an 
immunoglobulin. It may also be designed to assume a 
structure, such as a helical structure. Proteolytic 
cleavage sites may be designed into the spacer 
separating the variable region-like sequences from 
other pendant sequences so as to facilitate cleavage 
of intact BABS, free of other protein, or so as to 
release the bioactive protein In vivo . 

Figures 2A-2E illustrate five examples of 
protein structures embodying the invention that can 
be produced by following the teaching disclosed 
herein. All are characterized by a biosyntbetic 
polypeptide defining a binding site 3, comprising 
amino acid sequences comprising CDRs and Fits, often 
derived from different immunoglobulins, or sequences 
homologous to a portion of CDRs and FRs from 
different immunoglobulins. Figure 2A depicts. a 
single chain construct comprising a polypeptide 
domain 10 having an amino acid sequence analogous to 
the variable region of an immunoglobulin heavy chain, 
bound through its carboxyl end to a polypeptide 
linker 12, which in turn is bound to a polypeptide 
domain 14 having an amino acid sequence analogous to 
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the variable region of an immunoglobulin light 
chain. Of course, the light and heavy chain domains 
may be in reverse order. Alternatively, the binding 
site may comprise two substantially homologous amino 
acid sequences which are both analogous to the 
variable region of an immunoglobulin heavy or light 
chain. 

The linker 12 should be long enough (e.g., 
about 15 amino acids or about 40 A to permit the 
chains 10 and 14 to assume their proper 
conformation. The linker 12 may comprise an amino 
acid sequence homologous to a sequence identified as 
•self by the species into which it will be 
introduced, if drug use is intended. For example, 
the linker may comprise an amino acid sequence 
patterned after a hinge region of an immunoglobulin. 
The linker preferably comprises hydrophilic amino 
acid sequences. It may also comprise a bioactive 
polypeptide such as a cell toxin which is to be 
targeted by the binding site, or a segment easily 
labelled by a radioactive reagent which is to be 
delivered, e.g., to the site of a tumor comprising an 
epitope recognized by the binding site. The linker 
may also include one or two built-in cleavage sitos, 
i.e., an amino acid or amino acid sequence 
susceptible to attack by a site specific cleavage 
agent as described below. This strategy permits the 
V and V^-like domains to be separated after 
expression, or the linker to be excised after folding 
while retaining the binding site structure in 
non-covalent association. The amino acids of the 
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linker preferably are selected from among those 
having relatively small, unreactive side chains. 
Alanine, serine, and glycine are preferred. 

Generally, the design of the linker involves 
considerations similar to the design of the spacer, 
excepting that binding properties of the linked 
domains are seriously degraded if the linker sequence 
is shorter than about 20A in length, i.e., comprises 
less than about 10 residues. Linkers longer than the 
approximate 40A distance between the N terminal of a 
native variable region and the C- terminal of its 
sister chain may be used, but also potentially can 
diminish the BABS binding properties. Linkers 
comprising between 12 and 18 residues are preferred. 
The preferred length in specific constructs may be 
determined by varying linker length first by units of 
5 residues, and second by units of 1-4 residues after 
determining the best multiple of the pentameric 
starting units. 

Additional proteins or polypeptides may be 

attached to either or both the amino or carboxyl 

termini of the binding site to produce 

multifunctional proteins of the type illustrated in 

Figures 2B-2E. As an example, in Figure 2B, a 

helically coiled polypeptide structure 16 comprises a 

protein A fragment (FB) linked to the amino terminal 

end of a V„-like domain 10 via a spacer 18. Figure 
a 

2C illustrates a bifunctional protein having an 
effector polypeptide 20 linked via spacer 22 to the 
carboxyl terminus of polypeptide 14 of binding 
protein segment 2, This effector polypeptide 20 may 
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consist of, for example, a toxin, therapeutic drug, 
binding protein, enzyme or enzyme fragment, site of 
attachment for an imaging agent (e.g., to chelate a 
radioactive ion such as indium), or site of selective 
attachment to an immobilization matrix so that the 
BABS can be used in affinity chromatography or solid 
phase binding assay. This effector alternatively may 
be linked to the amino terminus of polypeptide 10, 
although trailers are preferred. Figure 2D depicts a 
trifunctional protein comprising a linked pair of 
BABS 2 having another distinct protein domain 20 
attached to the N-terrainus of the first binding 
protein segment. Use of multiple BABS in a single 
protein enables production of constructs having very 
high selective affinity for multiepitopic sites such 
as ceil surface proteins. 

The independently functional domains are 
attached by a spacer 18 (Figs 2B and 2D) covalently 
linking the C terminus of the protein 16 or 20 to the 
N-terminus of the first domain 10 of the binding 
protein segment 2, or by a spacer 22 linking the 
C-terminus of the second binding domain 14 to the 
N-terminus of another protein (Figs. 2C and 2D). The 
spacer may be an amino acid sequence analogous to 
linker sequence 12, or it may take other forms. As 
noted above, the spacer's primary function is to 
separate the active protein regions to promote their 
independent bioactivity and permit each region to 
assume its bioactive conformation independent of 
interference from its neighboring structure. 
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Figure 2E depicts another type of reagent* 
comprising a BABS having only one set of three CDRs, 
e.g., analogous to a heavy chain variable region, 
which retains a measure of affinity for the antigen. 
Attached to the carboxyl end of the polypeptide 10 or 
14 comprising the FR and CDR sequences constituting 
the binding site 3 through spacer 22 is effector 
polypeptide 20 as described above. 

As is evidenced from the foregoing, the 
invention provides a large family of reagents 
comprising proteins, at least a portion of which 
defines a binding site patterned after the variable 
region of an immunoglobulin. It will be apparent 
that the nature of any protein fragments linked to 
the BABS, and used for reagents embodying the 
invention, are essentially unlimited, the essence of 
the invention being the provision, either alone or 
linked to other proteins, of binding sites having 
specificities to any antigen desired. 

The clinical administration of 
multifunctional proteins comprising a BABS, or a BABS 
alone, affords a number of advantages over the use of 
intact natural or chimeric antibody molecules, 
fragments thereof, and conjugates comprising such 
antibodies linked chemically to a second bioactive 
moiety. The multifunctional proteins described 
herein offer fewer cleavage sites to circulating 
proteolytic enzymes, their functional domains are 
connected by peptide bonds to polypeptide linker or 
spacer sequences, and thus the proteins have improved 
stability. Because of their smaller size and 
efficient design, the multifunctional proteins 
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described herein reach their target tissue more 
rapidly, and are cleared more quickly from the body. 
They also have reduced immunogenic! ty. In addition, 
their design facilitates coupling to other moieties 
in drug targeting and imaging application. Such 
coupling may be conducted chemically after expression 
of the BABS to a site of attachment for the coupling 
product engineered into the protein at the DHA 
level. Active effector proteins having toxic, 
enzymatic, binding, modulating, cell differentiating, 
hormonal, or other bioactivity are expressed from a 
single DMA as a leader and/or trailer sequence, 
peptide bonded to the BABS. 

Design a nd Manufacture 

The proteins of the invention are designed 
at the DNA level. The chimeric or synthetic DNAs are 
then expressed in a suitable host system, and the 
expressed proteins are collected and renatured if 
necessary. A preferred general structure of the DNA 
encoding the proteins is set forth in Figure 8. As 
illustrated, it encodes an optimal leader sequence 
used to promoto expression in procaryotes having a 
built-in cleavage site recognizable by a site 
specific cleavage agent, for example, an 
endopeptidase, used to remove the leader after 
expression. This is followed by DNA encoding a 

V„-like domain, comprising CDRs and FRs, a linker, 

' 

a V^-like domain, again comprising CDRs and FRs, a 
spacer, and an effector protein. After expression, 
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folding, and cleavage of the leader, a bifunctional 
protein is produced having a binding region whose 
specificity is determined by the CDRs, and a 
peptide- linked independently functional effector 
region. 

The ability to design the BABS of the 
invention depends on the ability to determine the 
sequence of the amino acids in the variable region of 
monoclonal antibodies of Interest, or the DNA 
encoding them. Hybridoma technology enables 
production of cell lines secreting antibody to 
essentially any desired substance that produces an 
immune response. RNA encoding the light and heavy 
chains of the immunoglobulin can then be obtained 
from the cytoplasm of the hybridoma. The 5' end 
portion of the mRNA can be used to prepare cDNA for 
subsequent sequencing, or the amino acid sequence of 
the hypervariable and flanking framework regions can 
be determined by amino acid sequencing of the V 
region fragments of the H and L chains. Such 
sequence analysis is now conducted routinely. This 
knowledge, coupled with observations and deductions 
of the generalized structure of immunoglobulin Fvs, 
permits one to design synthetic genes encoding FR and 
CDR sequences which likely will bind the antigen. 
These synthetic genes are then prepared using known 
techniques, or using the technique disclosed below, 
inserted into a suitable host, and expressed, and the 
expressed protein is purified. Depending on the host 
cell, renaturation techniques may be required to 
attain proper conformation. The various proteins are 
then tested for binding ability, and one having 
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appropriate affinity is selected for incorporation 
into a reagent of the type described above. If 
necessary, point substitutions seeking to optimize 
binding may be made in the DNA using conventional 
casette mutagenesis or other protein engineering 
methodology such as is disclosed below. 

Preparation of the proteins of the invention 
also is dependent on knowledge of the amino acid 
sequence (or corresponding DNA or RNA sequence) of 
bioactive proteins such as enzymes , toxins; growth 
factors, cell differentiation factors, receptors/ 
anti-metabolites, hormones or various cytokines or 
lymphokines. Such sequences are reported in the 
literature and available through computerized data 
banks. 

The DNA sequences of the binding site and 
the second protein domain are fused using 
conventional techniques, or assembled from 
synthesized oligonucleotides, and then expressed 
using equally conventional techniques. 

The processes for manipulating, amplifying, 
and recombining DNA which encode amino acid sequences 
of interest are generally well known in the art, and 
therefore, not described in detail herein. Methods 
of identifying and isolating genes encoding 
antibodies of interest are w»ll understood, and 
described in the patent and other literature. In 
general, the methods involve selecting genetic 
material coding for amino acids which define the 
proteins of interest, including the CDRs arid FRs of 
interest, according to the genetic code. 
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Accordingly, the construction of DNAs 
encoding proteins as disclosed herein can be done 
us?.ng known techniques involving the use of various 
restriction enzymes which make sequence specific cuts 
in DNA to produce blunt ends or cohesive ends, DNA 
ligases, techniques enabling enzymatic addition of 
sticky ends to blunt-ended DNA, construction of 
synthetic DNAs by assembly of short or medium length 
oligonucleotides, cDNA synthesis techniques, and 
synthetic probes for isolating immunoglobulin or 
other bioactive protein genes* Various promoter 
sequences and other regulatory DNA sequences used in 
achieving expression, and various types of host cells 
are also known and available. Conventional 
transfection techniques, and equally conventional 
techniques for cloning and subcloning DNA are useful 
in the practice of this invention and known to those 
skilled in the art. Various types of vectors may be 
used such as plasmids and viruses including , animal 
viruses and bacteriophages. The vectors may exploit 
various marker genes which impart to a successfully 
transfected cell a detectable phenotypic property 
that can be used to identify which of a family of 
clones has successfully incorporated the recombinant 
DNA of the vector. 

One method for obtaining DNA encoding the 
proteins disclosed herein is by assembly of synthetic 
oligonucleotides produced in a conventional, 
automated, polynucleotide synthesizer followed by 
ligation with appropriate ligases. For example, 
overlapping, complementary DNA fragments comprising 
15 bases may be synthesized semi manually using 
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phosphoramidite chemistry, with end segments left 
unphosphorylated to prevent polymerization during 
ligation. One end of the synthetic DNA is left with 
a "sticky end" corresponding to the site of action of 
a particular restriction endonuclease, and the other 
end is left with an end corresponding to the site of 
action of another restriction endonuclease. 
Alternatively, this approach can be fully automated. 
The DNA encoding the protein may be created by 
synthesizing longer single strand fragments (e.g., 
50-100 nucleotides long) in, for example, a Biosearch 
oligonucleotide synthesizer, and then ligating the 
fragments . 

A method of producing the BABS of the 
invention is to produce a synthetic DNA encoding a 
polypeptide comprising, e.g., human FRs, and 
intervening "dummy" CDRs, or amino acids having no 
function except to define suitably situated unique 
restriction sites. This synthetic DNA is then 
altered by DNA replacement, in which restriction and 
ligation is employed to insert synthetic 
oligonucleotides encoding CDRs defining a desired 
binding specificity in the proper location between 
the FRs. This approach facilitates empirical 
refinement of the binding properties of the BABS. 

This technique is dependent upon the ability 
to cleave a DNA corresponding in structure to a 
variable domain gene at specific sites flanking 
nucleotide sequences encoding CDRs. These 
restriction sites in some cases may be found in the 
native pene. Alternatively, non-native restriction 
sites may be engineered into the nucleotide sequence 
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resulting in a synthetic gene with a different 
sequence of nucleotides than the native gene, but 
encoding the same variable region amino acids because 
of the degeneracy of the genetic code. The fragments 
resulting from endonuclease digestion, and comprising 
FR-encoding sequences, are then ligated to non-native 
CDR-encoding sequences to produce a synthetic 
variable domain gene with altered antigen binding 
specificity. Additional nucleotide sequences 
encoding, for example, constant region amino acids or 
a bioactive molecule may. then be linked to the gene 
sequences to produce a bifunctional protein. 

The expression of these synthetic DNA's can 
be achieved in both prokaryotic and eucaryotic 
systems via transfection with an appropriate vector. 
In cali and other microbial hosts, the synthetic 
genes can be expressed as fusion protein which is 
subsequently cleaved. Expression in eucaryotes can 
be accomplished by the transfection of DNA sequences 
encoding CDR and FR region amino acids and the amino 
acids defining a second function into a myeloma or 
other type of cell line. By this strategy intact 
hybrid antibody molecules having hybrid Fv regions 
and various bioactive proteins including a 
biosynthetic binding site may be produced. For 
fusion protein expressed in bacteria, subsequent 
proteolytic cleavage of the isolated fusions can be 
performed to yield free BABS, which can be renatured 
to obtain an intact biosynthetic, hybrid antibody 
binding site. 



- 36 - 



r Heretofore, it has not been possible to 

cleave the heavy and light chain region to separate 
the variable and constant regions of an 
immunoglobulin so as to produce intact Fv, except in 
^ specific cases not of commercial utility. However, 
one method of producing BABS in accordance with this 
invention is to redesign DNAs encoding the heavy and 
light chains of an immunoglobulin, optionally 
altering its specificity or humanizing its FRs, and 

;o incorporating a cleavage site and "hinge region" 

between the variable and constant regions of both the 
heavy and light chains. Such chimeric antibodies can 
be produced in transfectomas or the like and 
subsequently cleaved using a preselected 
endopeptidase . 

The hinge region is a sequence of amino 
acids which serve to promote efficient cleavage by a 
preselected cleavage agent at a preselected, built-in 
cleavage site. It is designed to promote cleavage 
preferentially at the cleavage site when the 
polypeptide is treated with the cleavage agent in an 
appropriate environment. 

The hinge region can take many different 
forms. Its design involves selection of amino acid 
residues (and a DNA fragment encoding them) which . 
impart to the region of the fused protein about the 
cleavage site an appropriate polarity, charge 
distribution, and stereochemistry which, in the 
aqueous environment where the cleavage takes place, 
efficiently exposes the cleavage site to the cleavage 
agent in preference to other potential cleavage sites 
that may be present in the polypeptide, and/or to 
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improve the kinetics of the cleavage reaction. In 
specific cases, the amino acids of the hinge are 
selected and assembled in sequence based on their 
known properties, and then the fused polypeptide 
sequence is expressed, tested, and altered for 
refinement. 

The hinge region is free of cysteine. This 
enables the cleavage reaction to be conducted under 
conditions in which the protein assumes its tertiary 
conformation, and may be held in this conformation by 
intramolecular disulfide bonds. It has been 
discovered that in these conditions access of the 
protease to potential cleavage sites which may be 
present within the target protein is hindered. The 
hinge region may comprise an amino acid sequence 
which includes one or more proline residues. This 
allows formation of a substantially unfolded 
molecular segment.. Aspartic acid, glutamic acid, 
arginine, lysine, serins, and threonine residues 
maximize ionic interactions and may be present in 
amounts and/or in sequence which renders the moiety 
comprising the hinge water soluble. 

The cleavage site preferably is immediately 
adjacent the Pv polypeptide chains and comprises one 
amino acid or a sequence of amino acids exclusive of 
any sequence found in the amino acid structure of the 
chains in the Fv. The cleavage site preferably is 
designed for unique or preferential cleavage by a 
specific selected agent. Endopeptidases are 
preferred, although non-enzymatic (chemical) cleavage 
agents may be used. Many useful cleavage agents, for 
instance, cyanogen bromide, dilute acid, trypsin, 
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Staphylococcus aureus V-8 protease, post proline 
cleaving enzyme, blood coagulation Factor Xa, 
enterokinase, and renin, recognize and preferentially 
or exclusively cleave particular cleavage sites. One 
currently preferred cleavage agent is V-B protease. 
The currently preferred cleavage site is a Glu 
residue. Other useful enzymes recognize multiple 
residues as a cleavage site, e.g., factor Xa 
(Ile-Glu-Gly-Arg) or enterokinase 
(Asp-Asp-Asp-Asp-Lys) . The principles of this 
selective cleavage approach may also' be used in the 
design of the linker and spacer sequences of the 
multifunctional constructs of the invention where an 
exciseable linker or selectively cleavable linker or 
spacer is desired. 

n^ion of fiv^h»fclc V u and Mimics 

FRs from the heavy and light chain murine 
anti-digoxin monoclonal 26-10 (Figures 4A and 4B) 
were encoded on the same DNAs with CDRs from the 
murine anti-lysozyme monoclonal glp-4 heavy chain 
(Figure 3 sequence 1) and light chain to produce V H 
(Figure 4C) and V L (Figure 4D) regions together 
defining a biosynthetic antibody binding site which 
is specific for lysozyme. Murine CDRs from both the 
heavy and light chains of monoclonal glp-4 were 
encoded on the same DNAs with FRs from the heavy and 
light chains of human myeloma antibody NEWM (Figures 
4E and 4F) . The resulting interspecies chimeric 
antibody binding domain has reduced immunogenicity in 
humans because of its human FRs , and specificity for 
lysozyme because of its murine CDRs. 
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A synthetic DNA was designed to facilitate 
CDR. insertions into a human heavy chain FR and to 
facilitate empirical refinement of the resulting 
chimeric amino acid sequence. This DNA is depicted 
in Figure 5. 

A synthetic, bifunctional FB-binding site 
protein was also designed at the DNA level, 
expressed, purified, renatured, and shown to bind 
specifically with a preselected antigen (digoxin) and 
Fc. The detailed primary structure of this construct 
is shown in Figure 6; its tertiary structure is 
illustrated schematically in Figure 2B. 

Details of these and other experiments, and 
additional design principles on which the invention 
is based, are set forth below. 

KENE DESIGN AND EXPRESSION 

Given known variable region DNA sequences, 
synthetic and V H genes may be designed which 
encode native or near native FR and CDR amino acid 
sequences from an antibody molecule, each separated 
by unique restriction sites located as close to 
FR-CDR. and CDR-FR borders as possible. 
Alternatively, genes may be designed which encode 
native FR sequences which are similar or identical to 
the FRs of an antibody molecule from a selected 
species, each separated by "dummy* CDR sequences 
containing strategically located restriction sites. 
These DKAs serve as starting materials for producing 
BABS, as the native or "dummy" CDR sequences may be 
excised and replaced with sequences encoding the CDR 
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amino acids defining a selected binding site. 
Alternatively, one may design and directly synthesize 
native or near-native FR sequences from a first 
antibody molecule, and CDR sequences from a second 
antibody molecule. Any one of the V ft and V L 
sequences described above may be linked together 
directly, via an amino acids chain or linker 
connecting the C-terrainus of one chain with the 
N-terroinus of the other. 

These genes, once synthesized, may be cloned 
with or without additional DNA sequences coding for, 
e.g., an antibody constant region, enzyme, or toxin, 
or a leader peptide which facilitates secretion or 
intracellular stability of a fusion polypeptide. The 
genes then can be expressed directly in an 
appropriate host, cell, or can be further engineered 
before expression by the exchange of FR, CDR, or 
■dummy" CDR sequences with new sequences. This 
manipulation is facilitated by the presence of the 
restriction sites which have been engineered into the 
gene at the FR-CDR and CDR-FR borders. 

Figure 3 illustrates the general approach to 
designing a chimeric V H ; further details of 
exemplary designs at the DKA level are shown in 
Figures 4A-4F. Figure 3, lines 1 and 2, show the 
amino acid sequences of the heavy chain variable 
region of the murine monoclonals glp-4 
(anti-lysozytne) and 26-10 (anti-digoxin) , including 
the four FR and three CDR sequences of each. Line 3 
shows the sequence of a chimeric V R which comprises 
26-10 FRs and glp-4 CDRs. As illustrated, the hybrid 
protein of line 3 is identical to the native protein 
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of line 2, except that 1) the sequence TFTNYYIHWLK 
has replaced the sequence IFTDFYMNWVR, 2) 
EWIGWIYPGNGNTKYNENFKG has replaced 
DYIGYISPYSGVTGYNQKFKG , 3) RYTHYYF has replaced 
GSSGNKWAM, and 4) A has replaced V as the sixth amino 
acid beyond CDR-2. These changes have the effect of 
changing the specificity of the 26-10 V H to mimic 
the specificity of glp-4. The Ala to Val single 
amino acid replacement within the relatively 
conserved framework region of 26-10 is an example of 
the replacement of an amino acid outside the 
hypervariable region made for the purpose of altering 
specificity by CDR replacement. Beneath sequence 3 
of Figure 3, the restriction sites in the DNA 
encoding the chimeric V H (see Figures 4A-4F) are 
shown which are disposed about the CDR-FR borders. 

Lines 4 and 5 of Figure 3 represent another 
construct. Line 4 is the full length V H of the 
human antibody NEWM. That human antibody may be made 
specific for lysozyme by CDR replacement as shown in 
line 5. Thus, for example, the segment TFTNYYIHWLK 
from glp-4 replaces TFSNDYYTWVR of NEWM, and its 
other CDRs are replaced as shown. This results in a 
V„ comprising a human framework with murine 
sequences rieterraining speciTicity. 

By sequencing any antibody, or obtaining the 
sequence from the literature, in view of this 
disclosure one skilled in the art can produce a BABS 
of any desired specificity comprising any desired 
framework region. Diagrams such as Figure 3 
comparing the amino acid sequence are valuable in 
suggesting which particular amino acids should be 
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replaced to determine the desired complementarity. 
Expressed sequences may be tested for binding and 
refined by exchanging selected amino acids in 
relatively conserved regions, based on observation of 
trends in amino acid sequence data and/or computer 
modeling techniques. 

Significant flexibility in V„ and V. 

it L» 

design is possible because the amino acid sequences 
are determined at the DNA level, and the manipulation 
of DNA can be accomplished easily.. 

For example, the DNA sequence for murine V H 
and V L 26-10 containing specific restriction sites 
flanking each of the three CDRs was designed with the 
aid of a commercially available computer program 
which performs combined reverse translation and 
restriction site searches ("RV.exe* by Compugene, 
Inc.). The known amino acid sequences for and 

26-10 polypeptides were entered, and all 
potential DNA sequences which encode those peptides 
and all potential -restriction sites were analyzed by 
the program. The program can, in addition, select 
DNA sequences encoding the peptide using only codons 
preferred by COli if this bacterium is to be host 
expression organism of choice. Figures 4A and 4B 
show an example of program output. The nucelic acid 
sequences of the synthetic gene and the corresponding 
amino acids are shown. Sites of restriction 
endonuclease cleavage are also indicated. The CDRs 
of these synthetic genes are underlined. 
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The DNA sequences for the synthetic 26-10 
V H and V L are designed so that one or both of the 
restriction sites flanking each of the three CDRs are 
unique. A six base site (such as that recognized by 
Bsm I or BspM I) is preferred, but where six base 
sites are not possible, four or five base sites are 
used. These sites, if not already unique, are 
rendered unique within the gene by eliminating other 
occurrences within the gene without altering 
necessary amino acid sequences. Preferred cleavage 
sites are those that, once cleaved, yield fragments 
with sticky ends just outside of the boundary of the 
CDR within the framework. However, such ideal sites 
are only occasionally possible because the FR-CDR 
boundary is not an absolute one, and because the 
amino acid sequence of the FR may not permit a 
restriction site. In these cases, flanking sites in 
the FR which are more distant from the predicted 
boundary are selected. 

Figure 5 discloses the nucleotide and 
corresponding amino acid sequence (shown in standard 
single letter code) of a synthetic DNA comprising a 
master framework gene having the generic structure: 

R 1 -FR 1 -X 1 -FR 2 -X 2 -FR 3 -X 3 -FR 4 -R 2 

where R 1 and R 2 are restricted ends which are to 
be ligated into a vector, and X^, X 2 , and X 3 
are DNA sequences whose function is to provide 
convenient restriction sites for CDR insertion. This 
particular DNA has murine FR sequences and unique, 
6-base restriction sites adjacent the FR borders so 
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that nucleotide sequences encoding CDRs from a 
desired monoclonal can be inserted easily. 
Restriction endonuclease digestion sites are 
indicated with their abbreviations; enzymes of choice 
for COR replacement are underscored. Digestion of 
the gene with the following restriction endonucleases 
results in 3' and 5* ends which can easily be matched 
up with and ligated to native or synthetic CDRs of 
desired specificity; Kpnl and BstXI are used for 
ligation of CDR X ; Xbal and Dral for CDR 2 ; and 
BssHII and Clal for CDR 3 - 

ft T ,T finmif^F ;OTTr>E synthesis 

The synthetic genes and DNA fragments 
designed as described above preferably are produced 
by assembly of chemically synthesized 
oligonucleotides. 15-I00mer oligonucleotides may be 
synthesized on a Biosearch DNA Model 8600 
Synthesizer, and purified by polyacrylaraide gel 
electrophoresis (PAGE) in Tris-Borate-EDTA buffer 
(TBE) . The DNA is then electroeluted from the gel. 
Overlapping oligomers may be phosphorylated by T4 
polynucleotide kinase and ligated into larger blocks 
which may also be purified by PAGE. 

rT/mTWfl OF fiVWTHETIC OLIGONUCLEOTIDES 

The blocks or the pairs of longer 
oligonucleotides may be cloned into Z*. Cflli using a 
suitable, e.g., pUC, cloning vector. Initially, this 
vector may be altered by single strand mutagenesis to 
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eliminate residual six base altered sites. For 
example, V R may be synthesized and cloned into pUC 
as five primary blocks spanning the following 
restriction sites: 1. EcoRI to- first Narl site; 2. 
first Narl to Xbal; 3. Xbal to Sail; 4. Sail to Ncol; 
5. Ncol to BamHI. These cloned fragments may then.be 
isolated and assembled in several three-fragment 
ligations and cloning steps into the pUC8 plasmid. 
Desired ligations selected by PAGE are then 
transformed into, for example, call strain JM83, 
and plated onto LB Ampicillin + Xgal plates according 
to standard procedures. The gene sequence may be 
confirmed by supercoil sequencing after cloning, or 
after subcloning into M13 via the dideoxy method of 
Sanger . 

PPTWPTPT.F OF CDR EXCHANGE 

Three CDRs (or alternatively, four FRs) can 
be replaced per or V L . In simple cases, this 
can be accomplished by cutting the shuttle pUC 
plasraid containing the respective genes at the two 
unique restriction sites flanking each CDR or FR, 
removing the excised sequence, and ligating the 
vector with a native nucleic acid sequence or a 
synthetic oligonucleotide encoding the desired CDR or 
FR. This three part procedure would have to be 
repeated three times for total CDR replacement and 
four times for total FR replacement. Alternatively, 
a synthetic nucleotide encoding two consecutive CDRs 
separated by the appropriate FR can be ligated to a 
pUC or other plasmid containing a gene whose 
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corresponding CDRs and FR have been cleaved out. 
This procedure reduces the number of steps required 
to perforin COR and/or FR exchange. 

EXPRESSION OF PROTEINS 

The engineered genes can be expressed in 
appropriate prokaryotic hosts such as various strains 
of Za. coll ■ and in eucaryotic hosts such as Chinese 
hamster ovary cell* murine myeloma, and human 
myeloma/transf ectoma cells. 

For example, if the gene is to be expressed 
in Ej. eoli . it may first be cloned into an expression 
vector. This is accomplished by positioning the 
engineered gene downstream from a promoter sequence 
such as trp or tac, and a gene coding for a leader 
peptide. The resulting expressed fusion protein 
accumulates in refractile bodies in the cytoplasm of 
the cells, and may be harvested after disruption of 
the cells by French press or sonication. The 
refractile bodies are solubilized, and the expressed 
proteins refolded and cleaved by the methods already 
established for many other recombinant proteins. 
0 If the engineered gene is to be expressed in 

myeloma cells, the conventional expression system for 
immunoglobulins, it is first inserted into an 
expression vector containing, for example, the Ig 
promoter, a secretion signal, immunoglobulin 
enhancers, and various introns. This plasmid may 
also contain sequences encoding all or part of a 
constant region, enabling an entire part of a heavy 
or light chain to be expressed. The gene is 
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transfected into myeloma cells via established 
electroporation or protoplast fusion methods. Cells 
so transfected can express or V H fragments , 

V L2 or V H2 homod * mers ' v l~ V H neterod ^ me rs, 
V H -V L or V L -V H single chain polypeptides, 
complete heavy or light immunoglobulin chains, or 
portions thereof, each of which may be attached in 
the various ways discussed above to a protein region 
having another function (e.g., cytotoxicity). 

Vectors containing a heavy chain V region 
(or V and C regions) can be cotransfected with 
analogous vectors carrying a light chain V region (or 
V and C regions), allowing for the expression of 
noncovalently associated binding sites (or complete 
antibody molecules) . 

In the examples which follow, a specific 
example of how to make a single chain binding site is 
disclosed, together with methods employed to assess 
its binding properties. Thereafter, a protein .. 
construct having two functional domains is 
disclosed. Lastly, there is disclosed a series of 
additional targeted proteins which exemplify the 
invention. 

I EXAMPLE OF CDR EXCHANGE AND EYPPESRTnq 

The synthetic gene coding for murine V„ 

ti- 
and V L 26-10 shown in Figures 4A and 4B were 

designed from the known amino acid sequence of the 

protein with the aid of Compugene, a software 

program. These genes, although coding for the native 

amino acid sequences, also contain non-native and 
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often unique restriction sites flanking nucleic acid 
sequences encoding CDR's to facilitate CDR 
replacement as noted above. 

Both the 3* and 5* ends of the large 
synthetic oligomers were designed to include 6-base 
restriction sites, present in the genes and the pUC 
vector. Furthermore, those restriction sites in the 
synthetic genes which were only suited for assembly 
but not for cloning the pUC were extended by "helper" 
cloning sites with matching sites in pUC. 

. Cloning of the synthetic DNA and later 
assembly of the gene is facilitated by the spacing of 
unique restriction sites along the gene. This allows 
corrections and modifications by cassette mutagenesis 
at any location. Among them are alterations near the 
5* or 3* ends - : the gene as needed for the 
adaptation to different expression vectors. For 
example, a PstX site is positioned near the 5* end of 
the V H gene. Synthetic linkers can be attached 
easily between this site and a restriction site in 
the expression plasmid. These genes were synthesized 
by assembling oligonucleotides as described above 
using a Biosearch Model 8600 DMA Synthesizer. They 
were ligated to vector pUC8 for transformation of E. 
coli . 

Specific CDRs may be cleaved from the 
synthetic V H gene by digestion with the following 
pairs of restriction endonucleases: HpHI and BstXI 
for CDR^; XbaX and Dral for CDR 2 ; and Sanll and 
BanI for CDRg. After removal on one CDR, another 
CDR of desired specificity may be ligated directly 
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into the restricted gene, in its place if the 3* and 
5' ends of the restricted gene and the new CDR 
contain complementary single stranded DNA sequences. 

In the present example, the three CDRs of 
each of murine V H 26-10 and 26-10 were 
replaced with the corresponding CDRs of glp-4. The 
nucleic acid sequences and corresponding amino acid 
sequences of the chimeric V H and V L genes 
encoding the FRs of 26-10 and CDRs of glp-4 are shown 
in Figures 4C and 4D. The positions of the 
restriction endonuclease cleavage sites are noted 
with their standard abbreviations. CDR sequences are 
underlined as are the restriction endonucleases of 
choice useful for further CDR replacement. 

These genes were cloned into pUC8, a shuttle 
plasmid. To retain unique restriction sites after 
cloning, the V H ~like gene was spliced into the 
EcoRl and Hindlll or BamHI sites of the plasmid. 

Direct expression of the genes may be 
achieved in call. Alternatively, the gene may be 
preceded by a leader sequence and expressed in E. 
Cflli as a fusion product by splicing the fusion gene 
into the host gene whose expression is regulated by 
interaction of a repressor with the respective 
operator. The protein can be induced by starvation 
in minimal medium and by chemical inducers. The 
V H~ V L oiosynthetic 26-10 gene has been expressed 
as such a fusion protein behind the trp and tac 
promoters. The gene translation product of interest 
may then be cleaved from the leader in the fusion 
protein by e.g., cyanogen bromide degradation, 
tryptic digestion, mild acid cleavage, and/or 
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digestion with factor Xa protease. Therefore, a 
shuttle plasraid containing a synthetic gene encoding 
a leader peptide having a site for mild acid 
cleavage, and into which has been spliced the 
synthetic BABS gene was used for this purpose. In 
addition, synthetic DNA sequences encoding a signal 
peptide for secretion of the processed target protein 
into the periplasm of the host cell can also be 
incorporated into the plasraid. 

After harvesting the gene product and 
optionally releasing it from a fusion peptide, its 
activity as an antibody binding site and its 
specificity for glp-4 (lysozyrae) epitope are assayed 
by established immunological techniques, e.g., 
affinity chromatography and radioimmunoassay. 
Correct folding of- the protein to yield the proper 
three-dimensional conformation of the antibody 
binding site is prerequisite for its activity. This 
occurs spontaneously in a host such as a myeloma cell 
which naturally expresses immunoglobulin proteins. 
Alternatively, for bacterial expression, the protein 
forms inclusion bodies which, after harvesting, must 
be subjected to a specific sequence of solvent 
conditions (e.g., diluted 20 X from 8 M urea 0.1 M 
Tris-HCl pH 9 into 0.15 M NaCl, 0.01 M sodium 
phosphate, pH 7.4 (Hochman et al. (1976) Biochem. 
13.:2706-2710) to assume its correct conformation and 
hence its active form. 

Figures 4E and 4F show the DNA and amino 
acid sequence of chimeric V H and V* L comprising 
human FRs from NEWM and murine CDRs from glp-4. The 
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CDRs are underlined, as are restriction sites of 
choice for further CDR replacement or empirically 
determined refinement. 

These constructs also constitute master 
framework genes, this time constructed of human 
framework sequences. They may be used to construct 
BABS of any desired specificity by appropriate CDR 
replacement. 

Binding sites with other specificities have 
also been designed using the methodologies disclosed 
herein. Examples include those having FRs from the 
human NEWM antibody and CDRs from murine 26-10 
(Figure 9A) , murine 26-10 FRs and G-loop CDRs (Figure 
9B), FRs and CDRs from murine MOPC-315 (Figure 9C) , 
FRs and CDRs from an anti-human carcinoembryonic 
antigen monoclonal antibody (Figure 9D) , and FRs and 
CDRs 1, 2, and 3 from V L and FRs and CDR 1 and 3 
from the V H of the anti-CEA antibody, with CDR 2 
from a consensus immunoglobulin gene (Figure 9E) . 

II. Model Binding S^ft: 



The digoxin binding site of the IgG 2a k 
monoclonal antibody 26-10 has been analyzed by' 
Mudgett-Hunter and colleagues (unpublished). The 
26-10 V region sequences were determined from both 
amino acid sequencing and DNA sequencing of 26-10 H 
and L chain mRNA transcripts (D. Panka, J.N. & 
M.N.M. , unpublished data). The 26-10 antibody 
exhibits a high digoxin binding affinity [K « 5.4 
X 10 M" 1 ] and has a well-defined specificity 
profile, providing a baseline for comparison with the 
biosynthetic binding sites mimicking its structure. 
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Protein Design : 

Crystallographically determined atomic 
coordinates for Fab fragments of 26-10 were obtained 
from the Brookhaven Data Bank. Inspection of the 
available three-dimensional structures of Fv regions 
within their parent Fab fragments indicated that the 
Euclidean distance between the C-terminus of the V H 
domain and the N- terminus of the domain is about 
35 A. Considering that the peptide unit length is 
approximately 3.8 A, a 15 residue linker wus selected 
to bridge this gap. The linker was designed so as to 
exhibit little propensity for secondary structure and 
not to interfere with domain folding. Thus, the 15 
residue sequence (Gly-Gly-Gly-Gly-Ser) 3 was 
selected to connect the V H carboxyl- and 
amino- termini . 

Binding studies with single chain binding 
sites having less than or greater than 15 residues 
demonstrate the importance of the prerequisite 
distance which must separate V H from V L ; for 
example, a (Gly 4 -Ser) 1 linker does not 
demonstrate binding activity, and those with 
(Gly 4 -Ser) 5 linkers exhibit very low activity 
compared to those with (Gly 4 ~Ser) 3 linkers. 

Gene Synthesis: 

Design of the 744 base sequence for the 
synthetic binding site gene was derived from the Fv 
protein sequence of 26-10 by choosing codons 



frequently used in E. ccli. The model of this 
representative synthetic gene is shown in Figure 8, 
discussed previously. Synthetic genes coding for the 
tro promoter-operator, the modified trp LE leader 
peptide (MLE) , the sequence of which is shown in 
Figure 10A, and V H were prepared largely as 
described previously. The gene coding for V H was 
assembled from 46 chemically synthesized 
oligonucleotides, all 15 bases long, except for 
terminal fragments (13 to 19 bases) that included 
cohesive cloning ends. Between 8 ana 15 overlapping 
oligonucleotides were enzyrnatically ligated into 
double stranded DNA, cut at restriction sites 
suitable for cloning (Narl, Xbal, Sail, SacII, Sad), 
purified by PAGE on 8\ gels, and cloned in pUC which 
was modified to contain additional cloning sites in 
the polylinker. The cloned segments were assembled 
stepwise into the complete gene mimicking V H by 
ligations in the pUC cloning vector. 

The gene .mimicking 26-10 V L was assembled 
from 12 long synthetic polynucleotides ranging in 
size from 33 to 88 base pairs, prepared in automated 
DNA synthesizers (Model 6500, Biosearch, San Rafael, 
CA; Model 380A, Applied Biosystems, Foster City, • 
CA) . Five individual double stranded segments were 
made out of pairs of long synthetic oligonucleotides 
spanning six-base restriction sites in the gene 
(Aatll, BstEII, Ppnl, Hindlll, Bglll, and PstI). In 
one case, four long overlapping strands were combined 
and cloned. Gene fragments bounded by restriction 
sites for assembly that were absent from the pUC 
polylinker, such as Aatll and BstEII, were flanked by 
EcoRI and BamHI ends to facilitate cloning. 
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The linker between V H and v^, encoding 
<Gly-GIy-Gly-Gly-Ser) 3 , was cloned from two long 
synthetic oligonucleotides, 54 and 62 bases long, 
spanning SacI and Aatll sites, the latter followed by 
an EcoRI cloning end. The complete single chain 
binding site gene was assembled from the V„, V_ , 
and linker genes to produce a construct, 
corresponding to aspartyl-prolyl-V H ~<linker>-V L , 
flanked by EcoRI and PstI restriction sites. 

The tro promoter-operator, starting from its 
Sspl site, was assembled from 12 overlapping 15 base 
oligomers, and the MLE leader gene was assembled from 
24 overlapping 15 base oligomers. These were cloned 
and assembled in pUC using the strategy of assembly 
sites flanked by cloning sites. The final expression 
plasiuid was constructed in the pBR322 vector by a 
3-part ligation using the sites SspZ, EcoRI, and PstI 
(see Figure 10B) . Intermediate DNA fragments and 
assembled genes were sequenced by the dideoxy method. 

Fusion Protein Expression; 

Single-chain protein was expressed as a 
fusion protein. The MLE leader gene (Fig. 10A) was 
derived from £. coli tro LE sequence and expressed 
under the control of a synthetic trp promoter and 
operator. E. call strain JM83 was transformed with 
the expression plasmid and protein expression was 
induced in M9 minimal medium by addition of 
indoleacrylic acid (10 ug/rol) at a cell density 
with A ftftft - 1. The high expression levels of the 
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fusion protein resulted in its accumulation as 
insoluble protein granules, which were harvested from 
cell paste (Figure 11, Lane 1). 

Fusion Protein Cleavage: 

The MLE leader was removed from the binding 
site protein by acid cleavage of the Asp-Pro peptide 
bond engineered at the junction of the MLE and 
binding site sequences. The washed protein granules 
containing the fusion protein were cleaved in S M 
guanidine-HCl + 10\ acetic acid, pH 2.5, incubated at 
37*C for 96 hrs. The reaction was stopped through 
precipitation by addition of a 10-fold excess of 
ethanol with overnight incubation at -20 # C, followed 
by centrifugation- and storage at -20°C until further 
purification (Figure 11, Lane 2). 

Protein Purification: 

The acid cleaved binding site was separated 
from remaining intact fused protein species by 
chromatography on DEAE cellulose. The precipitate 
obtained from the cleavage mixture was redissblved in 
6 M guanidine-HCl + 0.2 M Tris-HCl/pH 8.2, + 0.1 M 
2-mercaptoethanol and dialyzed exhaustively against 6 
-1* M urea ♦ 2.5 mM Tris-HCl, pH 7.5, + 1 mM EOT A. 

| 2-Mercaptoethanol was added to a final concentration 

I of 0.1 M, the solution was incubated for 2 hrs at 

room temperature and loaded onto a 2.5 X 45 cm column 
of DEAE cellulose (Whatman DE 52), equilibrated with 
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6 M urea + 2.5 mM Tris- HC1 + 1 mM EDTA, pH 7.5. The 
intact fusion protein bound weakly to the DE 52 
column such that its elution was retarded relative to 
that of the binding protein. The first protein 
fractions which eluted from the column after loading 
and washing with urea buffer contained BABS protein 
devoid of intact fusion protein. Later fractions 
contaminated with some fused protein were pooled, 
rechromatographed on DE 52, and recovered single 
chain binding protein combined with other purified 
protein into a single pool (Figure 11, Lane 3). 

Refolding * 

The 26-10 binding site mimic was refolded as 
follows: the DE 52 pool, disposed in 6 M urea + 2.5 
mM Tris-KCl + 1 mM EDTA, was adjusted to pH 8 and 
reduced with 0.1 M 2 -roe reap toe t ha no 1 at 37*C for 90 
min. This was diluted at least 100-fold with 0.01 M 
sodium acetate, pH 5.5, to a concentration below 10 
tig/ral and dialyred at 4°C for 2 days against 
acetate buffer. 

Affinity Chromatography ; 

Purification of active binding protein by 
affinity chromatography at 4°C on a 
ouabain-amine-Sepharose column was performed. The 
dilute solution of refolded protein was loaded 
directly onto a pair of tandem columns, each 
containing 3 ml of resin equilibrated with the 0.01 M 
acetate buffer, pH 5.5. The columns were washed - 
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individually with an excess of the acetate buffer, 
and then by sequential additions of 5 ml each of 1 M 
NaCl, 20 mM ouabain, and 3 M potassium thiocyanate 
dissolved in the acetate buffer, interspersed with 
acetate buffer washes. Since digoxin binding 
activity was still present in the eluate, the eluate 
was pooled and concentrated 20-fold by 
ultrafiltration (PM 10 membrane, 200 ml concentrator; 
Ami con) , reapplied to the affinity columns, and 
eluted as described. Fractions with significant 
absorbance at 280 nra were pooled and dialyzed against 
PBSA or the above acetate buffer. The amounts of 
protein in the DE 52 and ouabain-Sepharose pools were 
quantitated by amino acid analysis following dialysis 
against 0.01 M acetate buffer. The results are shown 
below in Table 1. 



ebb 
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TABLE 1 

Estimated Yields of BABS Protein During Purification 



Step 

Cell 
paste 



Wet wt. nig 

Per I protein 

12.0 g 1440.0 mg a 



Yield 

Cleavage relative 
yield <%) to fusion 
prior step protein 



Fusion 2.3 g 480.0 mg a » b 100.0% 100.0% 

protein. 

Granules 



Acid 

Cleavage/ 
DE 52 
pool 

Ouabain- 

Sepharose 

pool 



144.0 rag 



18.1 mg 



38.0° 38. 0 e 



12. 6 d 



4.7« 



4. 



a Determined by Lowry protein analysis 

^Determined by absorbance measurements 

c Determined by amino acid analysis 

^Calculated from the amount of BABS protein 
specifically eluted from ouabain-Sepharose relative 
to that applied to the resin; values were determined 
by amino acid analysis 

e Percentage yield calculated on a molar basis 
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Sequence Analysis of Gene and Prof-gin? 

The complete gene was sequenced in both 
directions using the dideoxy method of Sanger which 
confirmed the gene was correctly assembled. The 
protein sequence was also verified by protein 
sequencing. Automated Ed man degradation was conducted 
on intact protein (residues 1-40), as well as on two 
major CNBr fragments (residues 108-129 and 140-159) 
with a Model 470A gas phase sequencer equipped with a 
Model 120A on-line phenylthiohydantoin-araino acid 
analyzer (Applied Biosystems, Foster City, CA) . 
Homogeneous binding protein fractionated by SDS-PAGE 
and eluted from gel strips with water, was treated 
with a 20,000-fold excess of CNBr, in 1\ 
trif luoroacetic acid-acetonitrile (1:1), for 12 hrs at 
25° (in the dark). The resulting fragments were 
separated by SDS-PAGE and transferred 
electrophoretically onto an Immobilon membrane 
(Millipore, Bedford, MA), from which stained bands 
were cut out and sequenced. 

Specificity Determination: 

Specificities of anti-digozin 26-10 Fab and 
the BABS were assessed by radioimmunoassay. Wells of 
microtiter plates were coated with affinity-purified 
goat ant i -murine Fab fragment (ICN ImrounoBiologicals, 
Lisle, IL) at 10 ug/rol in PBSA overnight at 4°C. 
After the plates were washed and blocked with 1% horse 
serum in PBSA, solutions (50 ul) containing 26-10 
Fab or the BABS in either PBSA or 0.01 M sodium 
acetate at pH 5.5 wera added to the wells and 
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incubated 2-3 hrs at room temperature. After unbound 
antibody fragment was washed from the wells, 25 ul 
of a series of concentrations of cardiac glycosides 
(10" 4 to 10" X1 M in PBSA) were added. The cardiac 
glycosides tested included digoxin, digitoxin, 
digoxigenin, digitoxigenin, gitoxin, ouabain, and 
acetyl strophanthidin. After the addition of 
125 I-digoxin (25 ul, 50,000 cpm; Cambridge 
Diagnostics, Billerica, MA) to each well, the plates 
were incubated overnight at 4 °C, washed and counted. 
The inhibition curves are plotted in Figure 12. The 
relative affinities for each digoxin analogue were 
calculated by dividing the concentration of each 
analogue at 50% inhibition by the concentration of 
digoxin (or digoxigenin) that gave 50% inhibition. 
There is a displacement of inhibition curves for the 
BABS to lower glycoside concentrations than observed 
for 26-10 Fab, because less active BABS than 26-10 Fab 
was bound to the plate. When 0.25 M urea was added to 
the BABS in 0.01 H sodium acetate, pH 5.5, more active 
sFv was bound to the- goat ant i -murine Fab coating on 
the plate. This caused the BABS inhibition curves to 
shift toward higher glycoside concentrations, closer 
to the position of those for 26-10 Fab, although 
maintaining the relative positions of curves for sFv 
obtained in acetate buffer alone. The results, 
expressed as normalized concentration of inhibitor 

125 

giving 50% inhibition of I-digoxin binding, are 
shown in Table 2. 
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TABLE 2 



26-10 



Antibody 

Soficies 


Normalizing 
Glycoside D 


EG 


HQ. 


DOG 


A=S 


a 


Q 


Fab 


Digoxin 1.0 


1.2 


0.9 


1.0 


1.3 


9.6 


15 




Digoxigenin 0.9 


1.0 


0.8 


0.9 


1.1 


8.1 


13 


BABS 


Digoxin 1.0 


7.3 


2.0 


2.6 


5.9 


62 


150 




Digoxigenin 0.1 


1.0 


0.3 


0.4 


0.8 


8.5 


21 



D - Digoxin 

DG - Digoxigenin 

DO - Digitoxin 

DOG - Digitoxigenin 

A-S » Acetyl Strophanthidin 

G - Gitoxin 

O * Ouabain 

Affinity DflUrminatioji: 

Association constants were measured by 
equilibrium binding studies. In immunoprecipitation 
experiments , 100 ul of 3 H-digoxin (New England 
Nuclear, Billerica, MA) at a series of concentrations 
<10~ 7 M to 10" 11 M) were added to 100 ul of 
26-10 Fab or the BABS at a fixed concentration. 
After 2-3 hrs of incubation at room temperature, the 
protein was precipitated by the addition of 100 ul 
goat antiserum to murine Fab fragment (ICN Imrauno- 
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Biologicals) , 50 ul of the IgG fraction of rabbit 
anti-goat IgG (ICN ImmunoBiologicals) , and 50 ul of 
a 10* suspension of protein A-Sepharose (Sigma). 
Following 2 hrs at 4*C, bound and free antigen were 
separated by vacuum filtration on glass fiber filters 
(Vacuum Filtration Manifold, Millipore, Bedford, 
MA). Filter disks were then counted in 5 ml of 
scintillation fluid with a Model 1500 Tri-Carb Liquid 
Scintillation Analyzer (Packard, Sterling, VA) . The 
association constants, K Q , ware calculated from 
Scatchard analyses of the untransformed radioligand 
binding data using LI GAUD, a non-linear curve fitting 
program based on mass action. K q s were also 
calculated by Sips plots and binding isotherms shown 
in Figure 13A for the BABS and 13B for the Fab. For 
binding isotherms, data are plotted as the 
concentration of digoxin bound versus the log of the 
unbound digoxin concentration, and the dissociation 
constant is estimated from the ligand concentration 
at 50\ saturation. These binding data are also 
plotted in linear form as Sips plots (inset), having 
the same abscissa as the binding isotherm but with 
the ordinate representing log r/(n-r), defined 
below. The average intrinsic association constant 
(K Q ) was calculated from the modified Sips equation 
(39), log (r/n-r) « a log C - a log K Q , where r 
equals moles of digoxin bound per mole of antibody at 
an unbound digoxin concentration equal to C; n is the 
number of moles of digoxin bound at saturation of the 
antibody binding site, and a is an index of 
heterogeneity which describes the distribution of 
association constants about the average intrinsic 
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association constant K Q . Least squares linear 
regression analysis of the data indicated correlation 
coefficients for the lines obtained were 0.96 for the 
BABS and 0.99 for 26-10 Fab. A summary of the 
calculated association constants are shown below in 
Table 3. 

TABLE 3 



Association Constant, K Q 
Method of Data K Q (BABS), M" 1 K Q (Fab), M" 1 



Analysis 



Scatchard plot (3.2 ± 0.9) X 10 7 (1.9 ± 0.2) X 10 8 



Sips plot 

Binding 

isotherm 



2.6 X 10' 



5.2 X 10' 



1.8 X 10 c 



3.3 X 10 c 



V 



III. 



Synthesis of a Multifunctional Protein 



t 



L 



A nucleic acid sequence encoding the single 
chain binding site described above was fused with a 
sequence encoding the FB fragment of protein A as a 
leader to function as a second active region. As a 
spacer, the native amino acids comprising the last 11 
amino acids of the FB fragment bonded to an Asp-Pro 
dilute acid cleavage site was employed. The FB 
binding domain of the FB consists of the immediately 
preceding 43 amino acids which assume a helical 
configuration (see Fig. 2B) . 
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The gene fragments are synthesized using a 
Bipsearch DNA Model 8600 Synthesizer as described 
above. Synthetic oligonucleotides are cloned 
according to established protocol described above 
using the pUC8 vector transfected into coll. The 
completed fused gene set forth in Figure 6A is then 
expressed in E^. coll . 

After sonication, inclusion bodies were 
collected by centrifugation, end dissolved in 6 M 
guanidine hydrochloride (GuHCl), 0.2 M Tris, and 0.1 M 
2-mercaptoethanol (BME) , pH 8.2. The protein was 
denatured and reduced in the solvent overnight at room 
temperature. Size exclusion chromatography was used 
to purify fusion protein from the inclusion bodies. A 
Sepharose 4B column (1.5 X 80 cm) was run in a solvent 
of 6 M GuHCl and 0.01 M NaOAc, pH 4.75. The protein 
solution was applied to the column at room temperature 
in 0.5-1.0 ml amounts. Fractions were collected and 
precipitated with cold ethanol. These were run on SpS 
gels, and fractions rich in the recombinant protein 
(approximately 34,000 D) were pooled. This offers a 
simple first step for cleaning up inclusion body 
preparations without suffering significant proteolytic 
degradation. 

For refolding, the protein was diaiyzed 
against 100 ml of the same GuHCl -Tris -BME solution, 
and dialysate was diluted 11-fold over two days to 
0.55 M GuHCl, 0.01 M Tris, and 0.01 M BME. The 
dialysis sacks were then transferred to 0.01 M NaCl, 
and the protein was diaiyzed exhaustively before being 
assayed by RIA's for binding of 125 I-labelled 
digoxin. The refolding procedure can be simplified by 
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making a rapid dilution with water to reduce the GuHCl 
concentration to 1.1 M, and then dialyzing against 
phosphate buffered saline (0.15 H NaCl, 0.05 M 
potassium phosphate, pH 7, containing 0.03% NaN^) , 
so that it is free of any GuHCl within 12 hours. 
Product of both types of preparation showed binding 
activity, as indicated in Figure 7A. 

Demonstration of Bif unefcionuli fcv: 

This protein with an. FB leader and a fused 
BABS is bifunctional; the BABS can bind the antigen 
and the FB can bind the Fc regions of 
immunoglobulins. To demonstrate this dual and 
Simula taneous activity several radioimmunoassays were 
performed. 

Properties of the binding site were probed by 
a modification of an assay developed by Mudgett-Hunter 
et al. (J. Immunol. (1982) 122.: 1165-1172; Molec. 
Immunol. (1985) 22:477-488), so that it could be run 
on microtiter plates as a solid phase sandwich assay. 
Binding data were collected using goat ant i -murine Fab 
antisera (gAmFab) as the primary antibody that 
initially coats the wells of the plate. These are 
polyclonal antisera which recognize epitopes that 
appear to reside mostly on framework regions. The 
samples of interest are next added to the coated wells 
and incubated with the gAmFab, which binds species 
that exhibit appropriate antigenic sites. After 
washing away unbound protein, the wells are exposed to 
125 I-labelled (radioiodinated) digoxin conjugates, 
either as 125 I-dig-BSA or 125 I-dig-lysine. 
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The data are plotted in Figure 7A, which 
shows the results of a dilution curve experiment in 
.which the parent 26-10 antibody- was included as a 
control. The sites were probed with 125 I-dig-BSA as 
described above, with a series of dilutions prepared 
from initial stock solutions, including both the 
slowly refolded (1) and fast diluted/quickly refolded 
(2) single chain proteins. The parallelism between 
all three dilution curves indicates that gAmFab 
binding regions on the BABS molecule are essentially 
the same as on the Fv of authentic 26-10 antibody, 
i.e., the surface epitopes appear to be the same for 
both proteins. 

The sensitivity of these assays is such that 
binding affinity of the Fv for digozin must be at 
least 10 6 . Experimental data on digoxin binding 
yielded binding constants in the range of 10 8 to 
10 9 M~ l . The parent 26-10 antibody has an 
affinity of 5.4 X 10 9 M" 1 . Inhibition assays also 
indicate the binding of 125 I-dig-lysine, and can be 
inhibited by unlabelled digoxin, digoxigenin, 
digitoxin, dlgitoxigenin, gitoxin, acetyl 
strophanthidin, and ouabain in a way largely parallel 
to the parent 26-10 Fab. This indicates that the 
specificity of the biosynthetic protein is 
substantially identical to the original monoclonal. 

In a second type of assay, Digoxin-BSA is 
used to coat microtlter plates. Renatured BABS 
(FB-BABS) is added to the coated plates so that only 
molecules that have a competent binding site can stick 
to the plate. 125 I-labelled rabbit IgG 
(radioligand) is mixed with bound FB-BABS on the 
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plates. Bound radioactivity reflects the interation 
of IgG with the FB domain of the BABS, and the 
specificity of this binding is demonstrated by its 
inhibition with increasing amounts of FB, Protein A, 
rabbit IgG, lgG2a, and IgGl, as shown in Figure 7B. 

The following species were tested in order to 
demonstrate authentic binding: unlabelled rabbit IgG 
and IgG2a monoclonal antibody (which binds 
competiviely to the FB domain of the BABS); and 
protein A and FB (which bind competively to the 
radioligand). As shown in Figure 7B, these species 
are found to completely inhibit radioligand binding, 
as expected. A monoclonal antibody of the IgGl 
subclass binds poorly to the FB, as expected, 
inhibiting only about 34* of the radioligand from 
binding. These data indicate that the BABS domain and 
the FB domain have independent activity. 

IV. OTHER CONSTPrTPTS 

Other BABS-containing protein constructed 
according to the invention expressible in £. coli and 
other host cells as described above are set forth in 
the drawing. These proteins may be bifunctional or 
multifunctional. Each construct includes a single 
chain BABS linked via a spacer sequence to an effector 
molecule comprising amino acids encoding a 
biologically active effector protein such as an 
enzyme, receptor, toxin, or growth factor. Some 
examples of such constructs shown in the drawing 
include proteins comprising epidermal growth factor 
(EGF) (Figure 15A), streptavidin (Figure 15B) , tumor 
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necrosis factor (TNF) (Figure ISC) , calmodulin (Figure 
15D) the beta chain of platelet derived growth factor 
(B-PDGF) (15E) ricin A (15F), interleukin 2 (15G) and 
FB diroer (15H). Each is used as a trailer and is 
connected to a preselected BABS via a spacer 
(Gly-Ser-Gly) encoded by DNA defining a BamHI 
restriction site. Additional amino acids may be added 
to the spacer for empirical refinement of the 
construct if necessary by opening up the Bara HI site 
and inserting an oligonucleotide of a desired length 
having BamHI sticky ends. Each gene also terminates 
with a PstI site to facilitate insertion into a 
suitable expression vector. 

The BABS of the EGF and PDGF constructs may 
be, for example, specific for fibrin sc that the EGF 
or PDGF is delivered to the site of a wound. The BABS 
for TNF and ricin A may be specific to a tumor 
antigen, e.g., CEA, to produce a construct useful in 
cancer therapy. The calmodulin construct binds 
radioactive ions and other metal ions. Its BABS may 
be specific, for example, to fibrin or a tumor 
antigen, so that it can be used as an imaging agent to 
locate a thrombus or tumor. The strepta\adin 
construct binds with biotin with very high affinity. 
The biotin may be labeled with a remotely detectable 
ion for imaging purposes. Alternatively, the biotin 
may be immobilized on an affinity matrix or solid 
support. The BABS-streptavidin protein could then be 
bound to the matrix or support for affinity 
chromatography or solid phase immunoassay. The 
interleukin-2 construct could be linked, for example, 
to a BABS specific for a T-cell surface antigen. The 
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FB-FB dimer binds to Fc, and could be used with a BABS 
in an immunoassay or affinity purification procedure 
linked to a solid phase through immobilized 
immunoglobulin. 

Figure 14 exemplifies a multifunctional 
protein having an effector segment as a leader. It 
comprises an FB-FB dimer linked through its C-terminal 
via an Asp-Pro dipeptide to a BABS of choice. It 
functions in a way very similar to the construct of 
Fig. 15H. The dimer binds avidly to the Fc portion of 
immunoglobulin. This type of construct can 
accordingly also be used in affinity chromatography, 
solid phase immunoassay, and in therapeutic contexts 
where coupling of immunoglobulins to another epitope 
is desired. 

In view of the foregoing, it should be 
apparent that the invention is unlimited with respect 
to the specific types of BABS and effector proteins to 
be linked. Accordingly, other embodiments are within 
the following claims. 

What is claimed is: 
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Claims 

i- A single chain multi-functional biosynthetic 

protein expressed from a single gene derived by 
recombinant DNA techniques, said protein comprising: 

a biosynthetic antibody binding site capable 
of binding to a preselected antigenic determinant and 
comprising at least one protein domain, the amino 
acid sequence of said domain being homologous to at 
least a portion of the sequence of a variable region 
of an immunoglobulin molecule capable of binding said 
preselected antigenic determinant; and, peptide 
bonded thereto, 

a polypeptide selected from the group 
consisting of effector proteins having a conformation 
suitable for biological activity in mammals, amino 
acid sequences capable of sequestering an ion, and 
amino acid sequences capable of selective binding to 
a solid support. 

2. The protein of claim 1 wherein said binding 
site comprises at least two domains connected by 
peptide bonds to a polypeptide linker. 

3. The protein of claim 2 wherein said two 
domains mimic a V" H and a from a natural 
immunoglobulin. 
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4. The protein of claim 2 wherein the amino 
acid sequence of each of said domains comprises a set 
of CDRs interposed between a set of FRs, each of 
which is respectively homologous with at least a 
portion of CDRs and FRs from a said variable region 
of an immunoglobulin molecule capable of binding said 
preselected antigenic determinant. 

5. The protein of claim 4 wherein at least one 
of said domains comprises a said set of CDRs 
homologous to a portion of the CDRs in a first 
immunoglobulin and a set of FRs homologous to a 
portion of the FRs in a second, distinct 
immunoglobulin. 

6. The protein of claim 2 wherein said 
polypeptide linker spans a distance of at least 40 
angstroms is hydrophilic. 

7. The protein of claim 2 wherein said 
polypeptide linker comprises amino acids which 
together assume an unstructured polypeptide 
configuration in aqueous solution. 

8. The protein of claim 2 wherein said 
polypeptide linker is cysteine-f ree. 

9. The. protein of claim 2 wherein said 
polypeptide linker comprises a plurality of glycine 
or alanine residues. 
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10. The protein of claim 2 wherein said 
polypeptide linker . comprises plural consecutive 
copies of an amino acid sequence. 

11. The protein of claim 2 wherein said 
polypeptide linker comprises one or a pair of amino 
acid sequences recognizable by a site specific 
cleavage agent. 

12. The protein of claim 4 wherein said antibody 
binding site binds with said antigenic determinant 
with a specificity at least substantially identical 
to the binding specificity of said immunoglobulin 
molecule. 

13. The protein of claim 4 wherein said antibody 
binding site binds said antigenic determinant with an 
affinity of at least about 10 6 M" 1 . 

14. The protein of claim 4 wherein said antibody 
binding site binds said antigenic determinant with an 
affinity no less than about two orders of magnitude 
less than the binding affinity of said immunoglobulin 
molecule. 

15. The protein of claim 1 further comprising a 
polypeptide spacer incorporated therein interposed 
between said antibody binding site and said 
polypeptide. 

16. The protein of claim 15 wherein said 
polypeptide spacer comprises amino acids selectively 
susceptible to cleavage. 
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17. The protein of claim 15 wherein said spacer 
is hydrophilic. 

18. The protein of claim 15 wherein said spacer 
comprises amino acids which together assume an 
unstructured polypeptide configuration in aqueous 
solution. 

19. The protein of claim 15 wherein said spacer 
is cysteine-f ree. 

20. The protein of claim 15 wherein said spacer 
comprises a plurality of glycine or alanine residues. 

21. The protein of claim 15 wherein said spacer 
comprises plural consecutive copies of an amino acid 
sequence . 

22. The protein of claim 1 wherein said effector 
protein is an enzyme, toxin, receptor, binding site, 
biosynthetic antibody binding site, growth factor, 
cell-differentiation factor, lymphokine, cytokine, 
hormone, or anti-metabolite. 

23. The protein of claim 1 wherein said sequence 
capable of sequestering an ion is calmodulin, 
metallothioneln, a fragment thereof, or an amino acid 
sequence rich in at least one of glutamic acid, 
aspartic acid, lysine, and arginine. 
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24. The protein of claim 1 wherein said 
polypeptide sequence capable of selective binding to 
a solid support is a positively or negatively charged 
amino acid sequence, a cysteine-containing amino acid 
sequence, streptavidin, or a fragment of protein A. 

25. The protein of claim 1 comprising a 
plurality of biosynthetic antibody binding sites. 

26. The protein of claim 1 comprising an 
additional biof unctional domain. 

27. A DNA encoding the protein of claim 1. 

28. A host cell harboring and capable of 
expressing the DNA of claim 27. 

29. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker, the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a set of 
FRs, each o^f which is respectively homologous with at 
least a portion of CDRs and FRs from an 
immunoglobulin molecule, 

at least one of said domains comprising a 
said set of CDR amino acid sequences homologous to a 
portion of the CDR amino acid sequences of a first 
immunoglobulin molecule, and a set of FR amino acid 
sequences homologous to a portion of the FR sequences 
of a second, distinct immunoglobulin molecule, 
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said polypeptide domains together defining a 
hybrid synthetic binding site having specificity for 
a preselected antigen. 

30. The binding protein of claim 29 wherein said 
domains comprise FRs homologous to a portion of the 
FRs of a human immunoglobulin. 

31. The binding protein of claim 29 wherein said 
polypeptide domains are peptide bonded to a 
biologically active amino acid sequence. 

32. The binding protein of claim 29 further 
comprising a radioactive atom bound to said binding 
protein. 

33. A DNA encoding the binding protein of claim 
32. 

34. A host cell harboring and capable of 
expressing the DNA of claim 33. 

35. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker, the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a set of 
FRs, each of which is respectively homologous with at 
least a portion of CDRs and FRs from an 
immunoglobulin molecule, 
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said polypeptide linker comprising plural, 
peptide-bonded amino acids defining a polypeptide of 
a length sufficient to span the distance between the 
C-terrninal end of one of said domains and the 
N-terminal end of the other of said domains when said 
binding protein assumes a conformation suitable for 
binding , and comprising hydrophilic amino acids which 
together assume an unstructured polypeptide 
configuration in aqueous solution, 

said binding protein being capable of 
binding to a preselected antigenic site, determined 
by the collective tertiary structure of said sets of 
CDRs held in proper conformation by said sets of FRs 
and said linker when disposed in aqueous solution. 

36. The binding protein of claim 35 wherein said 
polypeptide linker spans a distance of at least about 
40A when said binding protein is disposed in aqueous 
solution in a conformation suitable for binding said 
preselected antigen. 

37. The binding protein of claim 35 wherein said 
polypeptide linker comprises a plurality of glycine 
or alanine residues. 

38. The binding protein of claim 35 wherein said 
linker comprises plural consecutive copies of an 
amino acid sequence. 



39. The binding protein of claim 35 wherein said 

linker comprises (Gly-Gly-Gly-Gly-Ser). . 
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40. The binding protein of claim 35 wherein at 
least one of said domains comprises a said set of 
CDRs homologous to a portion of the CDRs in a first 
immunoglobulin and a set of FRs homologous to a 
portion of the FRs of a second, distinct, human 
immunoglobulin. % 

41. The binding protein of claim 35 wherein at 
least one of said polypeptide domains is peptide 
bonded to a biologically active amino acid sequence. 

42. The binding protein of claim 35 further 
comprising a radioactive atom bound to said 
polypeptide chain. 

43. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques, 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker, the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a set of 
FRs, each of which are respectively homologous with 
at least a portion of CDRs and FRs from an 
immunoglobulin molecule, 

said binding protein being capable of 
binding to a preselected antigenic determinant, 
determined by the collective tertiary structure of 
said sets of CDRs held in proper conformation by said 
sets of FRs when disposed in aqueous solution, with a 
binding specificity at least substantially identical 
to the binding specificity of said immunoglobulin 
molecule comprising said homologous CDRs. 
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44. A biosynthetic binding protein expressed 
from DNA derived by recombinant techniques, 

said binding protein comprising a single 
polypeptide chain comprising at least two polypeptide 
domains connected by a polypeptide linker, the amino 
acid sequence of each of said polypeptide domains 
comprising a set of CDRs interposed between a set of 
FRs , each of which are respectively homologous with 
at least a portion of CDRs and FRs from an 
immunoglobulin molecule, 

said binding protein being capable of 
binding to a preselected antigenic determinant, 
determined by the collective tertiary structure of 
said sets of CDRs held in proper information by said 
sets of FRs when disposed in aqueous solution, with a 
binding affinity at least 10 6 M" 1 . 

45. The binding protein of claim 43 or 44 having 
a binding affinity at least about 10 8 m" 1 . 

46. The binding protein of claim 43 or 44 having 
a binding affinity no less than two orders of 
magnitude less than the binding affinity of said 
immunoglobulin molecule comprising said homologous 
CDRs . 

47. The binding protein of claim 43 or 44 
wherein at least one of said polypeptide domains is 
peptide bonded to a biologically active amino acid 
sequence. 
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48. The binding protein of claim 43 or 44 

further comprising a radioactive atom bound to said 
polypeptide chain. 



WO 88/09344 



PCT/US88/01737 



1/31 




FIG. IA 



Fftl 



FBI 



- CD *l 



"II"'' CPM I- 



r*3 



93 "J 



ISO 1*2 
I. 



2« 
Tt 



3? 

Ill 



S3 *> 
1*3 II J 



tft.tOGCNCS/ 
PROTEINS 



300 313 J5T0N1BAJCNO 



C0W3 

my i l i 



•0 lOt 113 AMINO ACIO 

tlOUlNCl NO. 

Ml 303 33» ONA »A3t NO. 

FIG. IB 




no. id 



PCT/US88/01737 



10 • » JO «o so 



EooII ro«*HI 3.U96I J^" JJjJ. 



AO 90 100 no 



Ecotll ttttXHlatll 
HaaXX r»pl 
Hhal 
HlaPI 
Mart 

KUIV 
Serf I 
Aeyt 

1*0 130 110 



CCAAATCCTCTCCCTlCATTTTCACCCACTTCTACATOAATTCCCTTCCCCACTCTCAfcCTAACTCTCT 
Rial H£ht Nlalll B»tXI HUIII Xbm 

HI 

150 '*0 170 180 190 



*C*CTACATCCCCTACATTTCCCCATACTCTCCCCTTACCCCCTACAACCACAACTTTAiACCTliCCee 
■A.pT,rll.Cl y T y Ml.S.rPr,Trr3.^^^ 

* «»* BatCH Oral 

" HpaXI 

HaaIXt 

220 230 "0 250 260 270 J*n 

if ^if^S^^^*^***^^^^^^^**^^^^^*^*^^*'^^^^^TCTTTCACCTCTCACCACTCCC 

i"i„ Hb0 " H1 OdaX Hlnfim 

J 1 " 11 *l*IIXBb*I sae 

Sf 1 * rnu.HI 
Taql 

CCCTATACTATTCCCCCOCCTCCTCTOCTAACAAATCCCCCA« 
UUlTyrTyPCMAl.Cly^ 

HlaPHl.IV M . m -Sjaij 

3.U96I HhtI 

**** HinPI 
NarX 

360 3T0 Kl.III 

J*CTCTATCCTCATACCAfC^ J 1 "" 

^hr Y.lSerSar *aaAap * C)r4 
BaafM 

BlaIV err M\ 

S*u3A FI-Ul. Tft 

XhoII 



WO 88/09344 



PCT/US88/01737 



10 20 30* «0 50 60 70 

CAATTCCACCTCCTAATGACCCACACTCCGCTGTCTCTGCCCGTTTCTCTCGGTGACCACGCTTCTATTT 
GlurhaAspValValMatThrGlnThrProLauSarlauProValSarLauGXyAapGlnAlaSarllaS 
EcoRI AatXI Hlnfl flpaXX BatCIZ 

AhaXI Kpht CeoRII 

TaqX Scrri 
AeyX Maalll 
MaalX 

80 90 100 110 . 120 130 no 

CTTCCCCCTCTTCCCACTCTCTCCTCCATTCTAATCCTAACACTTACCTCAACTCCTACCTCCAAAACCC 
trCys ArtStrStrGlnSarLauValHl aSar AanGI yAsnThrTyrx a uAanTrpTyrLauGlntya Al 
FnuiHl AvaXX HaaXXX HglEXX BanX 

HboII BatXX KpnX 

Sau96t MlalV 

Rami 

ISO 160 170 160 190 200 210 

TCCTCACTCTCCCAACCTTCTCATCTACAAACTCTCTAACCCCTTCTCTCGTCTCCCCCATCCTTTCTCT 
»G1 yGlnStrf roLr a LauL«uIltTy r LysVi IS tr A »nAritPh« 5 •r GlyVal fro AtpAr if haSar 
Alul Sau3A HpaXI 
HlndXII XoiX5au3A 

Sorri 

220 Z30 2«0 250 260 270 260 

CGTTCTCCTTCTCGTACTCACTTCACCCTGAACATCTCTCCTGTCCACGCCCACO.TCTCCCTATCTACT 
ClyStrClyS«rGlyThrA»pfhtThrLauLyaIleSirArtValCluAiaOluAapL«uGlyIlaTyrf 
Raal KphX Bflll Ta^IHaallX SauSA 

MboII XhoII 
SauJA 
XhoXI 

290 300 310 320 330 3*0 350 

TCTGCTCTCAGACTACTCATGTACCCCCCACCTTCCCCCGTCGCACCAAGCTCGAGATCAAACGTTGAGGATCC 
HaCraSarCln ThPthrH 1 aval Prof ro ThrfhaGlyClrGlrThrLvaLauGlullaLrflArfop 

DdaX NlalXX HglCXX BanX AluX 5au3A Haall BamHX 

RaaX MlaXY Aval NlaXV 

Taql Sau3A 
XhoX XhoII 
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10 20 30 «0 SO 60 to 

CAATTCCAJCCTTC1ACTCCAOCAOTCTCCTC&TCAATT0CTTAAACCTCCCOCCTCTCTCCCCATCTCCT 

CluPhtCluVAiClnL»uClnClnS«pClyProCluL«uV*lLy3ProClyAi*S«rValAriMttS«rC 
A*aII bdvX AT»II Ab.ii Hh4l 

EcoRI fnu*HI 5au96X BanI HlnPI 

T*ql P»tl EooRXI HatlHlalll 

Hfttiz r.pi 

HhaX 
HtnPI 
Marl 
HlalV 
AoyX 

80 90 100 110 120 MO lio 

CCAAATCCTCTGGGTACATTTTCACC^ 

yaLysStrStrGlyTyrlltPhtThrAanryrTyriltMlaTrpvaAArgClnStrHiaClyLyaStrLe 
"«I HpM fokl BatX I NlalU xb» 

Ma 

150 160 170 180 190 200 210 

ACACTACATCCCCTCCATCTACCCCCCTAATCCTAACACTAACTACT-ACAATCACAACTTT AAACeTAAC 

TCATCTCTCCCACCTACATCCCCCC ATTACC ATTOTCATTCATCATCTTACTCTTCA A A 
uA*pjyrii«giyTrpil«TyrProCiyAanGlyA»aThrLy»TyrTyrAanCiuAanpmLy3ClyLya 
I Sau3A Aral MatlXIDdalRaal Oral 

•I Xholl Hpall seal 

MeiX 
Moil 
Saal 
XaaX . 

220 230 2*0 250 260 270 280 

CCOACCCTTACTCTCOACAAATCTTCCTCAACTCCTTACATCCACCTCCCTTCTTTCACCTCTCACCACT 
AlaThrLtuThpValAapLyaS«rSarS«rThrAlaTyr«atCluLauArgS«rLauThrSepCluAapS 
Acer MboIX Alul DdeX Hinf 

Hindi HlalXXBbTl 

Fnu«HI 

TaqX 

290 300 310 320 330 3«0 350 

CCCCCCTATACTATTCCCCCCGCTCCTCTCCTAACAAAT CCCCCTTCCATTACTCCCCTCATCC CCCCTC 

CCAAGCTAATGAC CC CACTACCCC 

• rAlaV«lTyrTyrCyaAlaClySarSapClyAanLy3TrpAiaPn«AapTypTrpGlyMiaClyAlaSa 
I AceX HhalBanll HaaXIX HaaXII AhaXX 

FbuOIX FnuOII Sau96iTaqI Ban! 

S*eXI HinPIHlalV Haall 

Hhal 
HinPI 

360 370 Mapl 

TCTTACTCTATCCTCATAGCATCC NlallT" 
PValThPValSepSep«ao NlalV 
MaelXX BaaHl Ac yl 

HlalV 

Sau 3 A Fl&l. HC 

XhoII 
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10' 20 30 40 90 60 70 

CAATTCCACCTCCTiATCACCCACACTCCCCTCTCTCTCCCCCTTTCTCTCCCTOACCACCCTTCTlTTT 
Cluf htA»pV*iV*lM«tThrClnThrProL«u5«rL«uProV«lS«rLtuClyA»pClnAl*3«rIltS 
EcoRI Aatll Hlnfl HpaXI BstCZI 

AhaXI HphI tcoRII 

TaqI Sorfl 
Acyl MaaXIX 

Ml.II 

60 90 100 110 120 130 MO 

CTTCCCCCTCTTCCC ACTCTATTCTCCACTCTAATCCTAAC ACTTACCTCOATTCOTAC CTCC AA AACCC 

AACCCCflA&AA&6fif^A(iATA*CAtCTCAGATTACCATTCTCAATdflAd&TA"IC 
erCysAr(S«r5«rGln5trIltvalKla£«rA»nGlrA»nrnr T yrL«uA j pTrpTy rLtuGlnLy* A 1 
Pnu«KI HglAI MaalXI Eeoftll BanI 

mdoII SorTI Kpnl 

Hgicn FTaTv 
Raal 

150 160 170 180 190 200 210 

TC3TCACTCTCCCAACCTTCTCATCTACAAACTCTCTAACCCCTTCTCTCGTCTCCCCCATCCTTTCTCT 
aClyClnS«rProLysL«uLtutltTrrly9ValStrAanArfPhtS«rClyValProA3pArifht5«r 
AluX Sau3A KpalX 
HinalXI NelXSauSA 

SorFI 

220 230 2«0 250 260 270 280 

CCTTCTCCTTCTCCTACTCACTTCACCCTCAACATCTCTCCTCTCCACC CCCACCATCTCCCTATCTACT 

CCC T CC T aSaCCCa T aI THTT 

Cly5erClyS«rClyThrA9pPh«ThrL«uLraXl«SarArsValCluAlaCluAapL«uClyXlaTrrT 
JtaaX HphX Bflll Taql HatlXI Sau3A 

MboXX XnoII 
Sau3A 
XhoII 

290 300 310 320 330 3*0 350 

ACTCCTTCCACCCCTCTCATOTACCCTCCACCTTCCCCCCTCCCACCAACC TCCACATCAAACCTTCACCATCC 
TCACCAACflTcCgdACAGTACATCCCACCTOOAAOCCCCCACCCTOCTTCCACCT 
rrCy9Pn«Cla01ySarHlaVal?roTrpThrPh*ClyClyClyTtirLyaL«uGluXlcLyaArc*ep 

EcoBXX MlaXIX Avail Banl AluX Sau3A Maall BaoHX 

Serri ItaaX Sau96X NlaXV Aval NlalV 

HglCXX Taql Sau3A 

Xhol XhoII 



riGi. Mb 
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10 20 30 *0 50 60 to 

SJ A II C i TC SJ A S T JJ AACTCCAACiATCTCCCCCC6CTCT6CT * CCTCC CtCTCACACTCTCTCCCTCA 
S 1U !?:? t !5} u I ai ; iDL,uClaCIoS "01y^oClyL.g¥.lAr l froS«rClnTbrL # u3.rL.uT 
EcoBIlUallX ApalHpall taal OdtlHlofI 

•"H Hatlt TthlllX 

H«tt:t 

Met I 

3au96X 
3»u96I 
3orfI 

80 - 90 100 tlO 120 '30 iao 

CTTCTACCCTATCCCCATCCACCTTCTCTAACTACTACATCCATTCGCTCCCTCAACCCCCCCCTCCTCC 
hrCyarhrV.13TCly3trThrPh«3TAjn TrrTyrIltHta TrDV.lArtClnPr n Pr Q ci.w t r.i 
Haal BaaHl foMl AviII HlncII HpalX 

H P* IX NlalV Hell 

"1»IV 3au96I SerFI 

3au3A 
IhoII 

150 ^0 170 180 190 200 210 

TCTCCACTCCATCCCTTCCATTTACCCCCOTAATCCTAACACTAACT ACTACA ATCACAACTTTAAA06C 
yLauCluTrpIltClyTrpIltTyrPr eClyAanClyAsnThrtyiTyrTYrAanCluAanfhtLya Cly 

Aval 3»u3A juil Ha.IIIOatliaal 67al M 

Hpall Scil 3 

Ihel Moll 

Molt 
Sorri 

SerFI 
Sail 
Xaal 

230 230 240 250 260 270 260 

ATCCTCCTCOACACTTCTAAOAACCAATTCTCTCTCCGTCTCTCTTCTCTTACCCCCCCTCATACTCCTC 
MatLtuyaXA*pThrSartMAanClnPhtSarLauAr|Lttt3«r3aryalThrAlaAlaAjpThrAlaV 
lalll Aeel OdttXmnt H,al HboII Haa ItlFnuHHI 

hi J 1 ?? 11 BbvII rnoCtl 

5|U f S.oII 
TaqI 

290 300 310 320 330 300 350 

TCTACTACTCCCCCCCTTCCTCCOCTAATiAOTCCCCATTTCATTACTCCCCCCACCCCTCTCTCOTCAC 
alTyrTyrCyaAlaAr«5trSerClyAanL y3TrpAlaPhaA3pTyrTr P Glv ClnCivS»rL eu tfalTh 
*"I 5|l»H| r HP^XI Hl.li B.nll* BatEII 

fn ; DI * CcoBII MphT 

uw Ha.III Maatll 

Hh.I Sau96I 

360 370 
CCTATCCTCTTAACTCCAC 
r ValStrSar •ocL#uCln 
Pst t 
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10 



20 



30 



40 



CiATTCATCGAATCTCTTCTGACTCACCCCCCCTCTGTATCTCCTGCA 



50 



60 



TO 



Kl.III HiofI KoilHlncII Hum 

* . - IT T 



SarFI 
120 



130 



CTTCCCCTTCCTCTCACTCTATTCTCCATTCTAATCCCAACACTTATCTCCl^ 

5 " i ::::- i ""-:::;; n" 6 ": c mu A f Tc f c 

Bant Ho 



Kpnl 
MiatV 
JtaaX 

200 



Hp 
Ho 
So 



150 160 'TO 160 too 9nn 

ccccaccccccccaasctcctcatctttaaactatctaatcccttctctoccctacccSJtccittp?^ 

II Hh.X Bbvl S.»3A ""JL.TT^^t 

rFl HlofI FnuHHI Hp. II Hlafl 

BanI Sau3A 
XI. IV Taql 



CTATCTAACTCTCCCTCCTCTCCCACTCTCCCCATCACTCCTCTCCAAM 

a t i «f. BaaI --*1«X Hgal 

SJieix " l * IT HlndITI 



360 
CTAACTCCAC 
o*ocL«uClo 

Fat I 
Haaltl 
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10 20 30 40 SO 10 70 

GAAGTTCAACTGCACCAGTCTGGTXCTCAATTCCTTAA^^ 
ZVQLOQSGPElVXPGASVftHSCKSS 



Bbvl* 
TmMHX 
PttX 



AvaXX 
Sau9(I 



AhaXX HhaX 
BanXHnlX* HlnPX 
EcoRXX fapXMUXX 



MniX* 



Ha«XX 

HhaX 
HlnPX 
NarX 
NlaXV 

scrrx 



NSpHI 



x. 



I 

as 



95 



I 

105 



11S [ 125 135 MS 

CCGTACCCCO^CTCTCATGGTAAGTCTCT ACACIA 
CYROSHCKSLDrXCKATLTVOKSSS 

BanI Bitxi NiaXXX XfiAl *eoI MboXl- 

XpnX Bill HincXX Knit. 

ntnv saix 

Mai TaqX 



PR. J , *j . 

XIO 170 180 190 200 |U0 | 220 

ACTCCTTACATGGAGCTGC UUU i I CACCTCTCACCACTCCCCCCTATACTATTCCCCCCCTATCCATTA7TCC 



T A Y 



M t L 
AlUl 
NlaXXXBbvX- 
7nu4KX 



It S 



LT SCDSAVY 
DdaX HlnfX AeeX 
HnlX*HMX- AccXX 
NapBXX 
SaeXX 



C A R X 0 
ACCXX fill 
ACCXX TaqX 
BaaHII 
HhaX 

KhaX 
HlnPX 
HinPX 



w 
HI 
5 



0 

255 



265 



225 245 
GCCCATCGCCCTACCCTTACCCTCAGCTCCTAAGCATCC 

CHCASVTVSS'CS 
aiv Haall Alul OdaXBamHX 

aoftX HhaX BanXIMitXXNUXV 
HaaXXX HlnPX Btpl2Bf SauJA 

NCOl Nhal HQlAI XhoXX 

NiaXXX SacX 
StyX 



w0 88/09344 PCT/US88/01737 



'«>/3/ 

10 20 30 40 50 60 70 

GAATTCATGGCTGACAACAAATTCAACAAGCAACAGCAGAACGCGTTC^^ 

rFMADNXFNXEQQNAFYEILHLPNL 
Ec0 RI Mlul Bglll BspMI+ 

;cnni 

85 95 105 115 125 135 145 

AACCAACACCACCCTAACCCCTTCATCCAXACCTTCAXACACCACCCGTCTCACACCCCTAA 
WEEQRNGFIQ5LXDDPSQSANLLAE 

Hindlll B«pMI+ 

EC047III 

160 170 180 190 200 210 220 

GCCAAGAAACTCAACGACCCTCACGCGCC6AAGAGTGATCCCCAAGTTCAA 
AXKLHDAQAPXSDPEVQLQQSCPEL 

Karl P»tl 

235 245 255 265 275 285 295 

CTTAAACCTCCCCCCTCTCTCCCCATCTCCTCCAAATCCTCTCCCTACATTTTCACCCACTTCTACATCAATTCC 
VKPGASVRMS C X S S G Y I FT. DFYMNW 
Karl FspX 

310 320 330 340 350 360 370 

STTCGCCAGTCTCATGGTAACTCTCTAGACTACATCGGGTACATTTCCCCATACTCTGGGGTTACCGCCTACAAC 
VRQSHGXSLDYICYISPYS GVTGYN 
BstJCI Xbal PflMI BstEII 

385 395 405 415 425 435 445 

CAGAAGTTTAAAGGTAAGGCGACCCTTACTGTCGACAAATCTTCCTCAACTGCTTA 
QXFXGXATLTVOXSSSTAYMELRSi. 
Oral Sail 

460 470 480 490 500 510 520 

ACCTCTCACCACTCCGCGGTATACTATTGCGCGGGCTCCTCTGGTAACAAATGGGCCATCGATTATTCGGGTCAT 
TSEOSAVYYCAGSSGHXWAMDYWGH 
Sad I Kcol 

535 545 555 565- 575 5B5 595 

GGTGCTAGOriTACTGTGAGCTCTCCTCGCGCTCCCTCCGCCCCTCCrCCCTCCGCTCCCCCCG 
GASVTVSSGGGGSGGGGSGGGGS D V 
Nh«I SacI BanHI Aatll 

610 620 €30 640 650 460 670 

CTTCTTACCCACACTCCGClfclll 1 1 I C LU U 'l I X'CTCTGGGTGACCAGC CTTCTATTTCTTCCCC CTCTTCCCAG 
VVTQTPX.5LPVSLGDQASZS CRSSQ 

B«tEII P«1M 

685 695 705 715 725 735 745 

TCTCTGGTCCATlCrAATC^AACACTrACCTGAACTGCTACCTGCAAAAGGCTGGTCAGT 

SLVHSHGKTYLMHYLQXAGQSPXLL 
1 BstXI B«pMI+ Hindlll 

Kpnl 
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760 770 780 790 600 810 820 

ATCTACAAAGTCTCTAACCGL'I TCI CT CC TGTCCCCCAT CU ni 'CTCT GG IT CTGCTTCTGGTACTGACTTCACC 
lYKVSHRrSGVPDRFSGSGSGTDFT 

835 845 855 865 875 885 695 

CTCAACATCTCTCCTCTCCACCCCCAAGACCTCCCTATCTACTTCTCCTCTCACACTACTCATCTACCCCCCACT 
LKISRVEAEDLCIYFCSQTTHVPPT 
Bglll 

910 920 930 940 

TTTCCTCGTCCCACCAACCTCGACATTAAACCTTAACTCCAC 
FCCCTKLEIKR* 

XhoX HpaX PstI 




WO 88/09344 PCT/L ! S88/01737 

jq 20 30 40 SO AO 

GATCCTGACGTCGTAATGACCCAGACTCCGCT6TCTCTCCCGGTT7CTCTGGGTGACCAG 
DPOVVrtTQTPUSLPVSLOOO 
Altlt B-tEII 

70 BO 90 lOO 110 120 

GCTTCTATTTCTTGCCGCTCTTCCCAGTCTCTGGTCCATTCTAATGGTAACACTTACCTO 
A61 BCRBB08LVM8-NOHTYL 

Pflfll B»tXX 

130 HO 190 160 170 1B0 

AACTG3TACCTGCAAAAGGCTGGTCAGTCTCCGAAGCTTCTGATCTACAAAGTCTCTAAC 
NMVLOKAGOBPKLLX VKVSN 
BspflX* Hindi IX 

Kpnl . 

190 200 210 220 230 240 

CGCTTC7C7GGTGTCCCGGATCBTTTCTCTGGTTCTGGTTCVGGTACTGACTTCACCCTG 
RFSGVPDRF80BBS3T0FT L 

230 260 270 2B0 290 300 

AAGATCTCTCGTGTCGAGGCCGAAGACCTGGGTATCTACTTCTGCTCTCAGACTACTCAT 
K I SRVEAEDLGIYFCSOT TH 
BolII 

-» 1C z^O ' 330 340 350 360 

GTACCGCCGACTTTTGGTGGTGGCACCAAGCTCGAGATTAAACGTGGATCTGGAGGTGGC 
VPPTFGGGTKL6I KRGSGGG 

Xhol 

370 3B0 390 400 410 420 

GGATCTGGTGGAGGTGGCTCTGGTGGCGGTGGATCCGAA6TTCAATTGCAGCAGTCTGGT 
BSGGGSSGGGGSEVOLOOSG 

B*«H2 

430 440 430 460 470 480 

CCTGAATTGGT7AAACCTGGCGCCTC7GT6CGCATGTCCTGCAAATCCTCTGGGTACATT 
PELVK PGASVRflSCKSSGV: 
N«rl F»pl 

490 300 310 320 330 340 

TTCACCGACTTCTACATGAATTGGGTTCGCCAGTCTCATGGTAAGTCTCTAGACTACATC 
FTDFYMNWVROSHBKSLDYX 

BltXX Xb*I 

330 360 370 * 380 390 . 600 

GGGTACATTTCCCCATACTCT6GGGTTACCGGCTACAACCAGAAGTTTAAAGGTAAGGCG 
0Y1SPVS6VT0YN0KFKSKA 
P-flMI titEII - Dr*I * 

610 620 630 640 630 660 

ACCCTTACTBTCGACAAATCTTCCTCAACTGCTTACATG6AGCTGCGTTCTTTGACCTCT 
TLTVOKSSSTAYflELRSLTS 
S«1X 

670 680 690 700 710 720 

GAGGACTCCGCGGTATACTATTGC6CGGGCTCCTCTGGTAACAAATGGGCCATGGATTAT 
E05AVYYCAGS5GNKWAM0Y 

Gael I Nco1 

730 740 730 760 flf, feB 

TGGGGTCATGGTGCTAGCGTTACTGTGA3CTCTTAACTGCA5 M V*. ^ 
■ K B H S A B V T V S S • ■ 
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10 20 30 40 50 60 

CJL\CTTCAACTGCACCACTCTCCTCCTGCATTCCTrCCACCTTCCCAGACTCTCTCCCTC 
EVQLEQSCPCLVRPSQTLSL 

70' 80 90 100 110 120 

ACCTdCACATCCTCTCCCTACATTTTCACCCACTTCTACATCAATTCCCTTCCCCACCCT 

TCTS SGYI FTDFYMNWVRQP 
B»pMI+ BstXI 

130 140 ISO 160 170 180 

CCTCCTCCCCCTCTACACTACATCCCCTACATTrCCCCATACTCTCCCCTTACCCCCTAC 
PGRGLDYICYISPYSCVTCY 
Xbal PflMZ BstriZ 

190 200 210 220 230 240 

AACCAGAAGTTTAAAGGTAAGGCGACCCTTCXGCTCAACAAATCTAACAACCACCCTTCC 
NQXFKGKATLLVNXSKNQAS 
Dral 

250 260 270 280 290 300 

CTGCGGCTCTCTTCTCTGACCGCTCCCCACACCCCGCTAT\CTATTCCCCCGCCTCCTCT 
LRLS SVTAADTAVYYCAGSS 

SacII 

310 320 330 340 350 360 

GGTAACAAATGCGCCATGCATTATTGGGGTCACmGTTCTCTGGTTACTGI CAGCTCTGGT 
GNKWAMDYWCQGSLVTVSSG 
Ncol Sad 

370 -380 390 400 410 420 

CGCCGTGGGTCGGGCGGTGGTGGCTCGGGTCCCGCCCGATCCGACGTCGTTATGACCCAG 
GGG5GGGG5CGC5SDVVHTQ 

BanHI Aatll 

430 440 450 460 470 480 

CCTCCGTCGGTTTCGGGCCCTCCTGGTCACCCCCTTKCTATTTCTTCCCCCTCTTCCCAG 
PPSVSCAPGQRVTISCRS SQ 

PflM 

490 SOO 510 520 530 , 540 

TCTCTGCTCCATTCTAATGGTAACACTTACCTCAACTCGTACCAGCAACTCCCTCCTACG 

SLVHSKGNTYLHWYQQLPGT 
I BStXZ Kpnl 

550 560 S70 580 590 600 

CCTCCCAAGCTTCTCATCTACAAAGTCTCTAACCCCTTCTCTCCTCTCCCGGATCGTrTC 
APKLLIYKVSHRFSCVPDRF 
Hindill 

610 '620 630 640 650 660 

TCIGGI ICI G GI ICT G GTACTGACTTCACCCTGGCGATCACTGGTCTCCAGGCCGAAGAC 
SGSGSGTDFTLAITGLQAED 

670 680 690 700 710 720 

G AGG CTGACTA CITCTC CTCTCAGACTACTCATGTACCG CCG A C 1 l ' rr GCTCGTGC CACC 
EADYFCSQTTHVPPTFG GGT 

730 740 750 Q 

AACCTCACCGTTCTGCGTTAACTCCAC h \ G\. I A 

KLTVLR*LQ 
Upal PstI 
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10 20 30 40 30 60 

CAATTC(^CTTCAACTCCXCCXCTCTCCTCCTCXArrCCTTXAACCTCCCCCCTCrCTC 
EFEVQLQQSGPELVKPGASV 
AauXI pstl Marl Fa 

ECORX 

70 10 90 100 110 120 

CCCXTCTCCTCOUUTCCrrCTCCCTACACCTTCACaUCTATTACATCCACTCCCTTJaC 

R H S C X 5 SCYTFTHYYXRWLX 
Pi AX1II 

130 140 ISO 110 170 ltO 

CAGTCTCATCCTAACTCTCTACACTCCATCCCTTCCATTTACCCCCCTAATCCTAACACT 
Q SHCXSLEWICUZYPCNCNT 

Xbal sui 

190 200 210 220 230 240 

AAGTACAATGAGAACTTTAAACGTAACCCGACCCTTACTCTCCACAAATCTTCCTCA/>£T 
KYXENF KGXATLTVDXSSST 
Oral Sail 

230 260 370 7.U 390 300 

CCTTACATCCACCTCCCTTCTTTCACCTCTGACCACTC.TCCCCTATACTATTCCCCCCCT 
AYMELRSLTS EDSAVYYCAR 

SacXX BiaHII 

310 320 330 340 330 360 

TACACTCATTATTACTTCCATTATTGCCGCCATCCCGCTACCCTTACCGTCACCTCTCCT 
YTHYYFDYWCHGASVTVSSC 

NCOI NbaX SacX 

370 310 390 400 410 420 

CCCCGTCCCTCCCCCCGTCCTGCCTCGGCTCGCCGCCCATCCCACCTCCTTATCACCCAC 
CGCSGG GSGGGG5DVVMTQ 

Ba&HX AatXX 

430 440 430 460 470 460 

ACTCCCCTGTCTCTCCCGUlilLlC.IGCCTCACCACGLllLlAlliLlll.CLl.tiLAlLC 
TFLSLPVSLGDQASXSCRS5 

BatZXI 

490 300 S10 320 530 340 

CACTCTATCCTCCATrcrAATCCTAACACTTACCIGGACTCCTACCTCCAAAACCCTCCT 
0 S I VH5NGNTYLEWYLQKAG 
. BatXX BapMX* 

XpnX 

330 360 570 360 590 600 

CACTCTCCCAAJ C 1111 CA TCTACAAAGTCTCTAACCC L 1 1 L A Ll UUi Vl CCCCCATCCT 
QSPXLLIYKVSHRrSGVPDR 
HlndXXX 

610 620 630 640 650 660 

II ClClUUAlLAUUlALlC CTACTCACTTCACCeTCAAC A TCTC l CUA UlCa AGGCCCAG 
FSG SCSGTDFTLXI5RVEAE 

BglXX 

670 680 690 700 710 720 

CATCTGCGTATCTACTACTCCTTCCAACCCTCTXATCTACCCTGCACTTTCCCCCCTCCG 
OLGXYYCFQCSHVPWTFGCC 

730 740 750 

ACCAAGCrCCACATTAAACGTTAACTCCAG f~ i /• SR 

TXLEIXR*I*Q h IVJ »- 

Xhol HpaX PatX 
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10 20 30 40 50 60 

GATCCCGACGTTATGCTGGTTCAATCTGGTCCACTACTCATCGAACCTGGTCGGTCCCTC 
DPEVMLVESGGVLMEPCCSL 

Seal EcoO 

70 80 90 100 HO 120 

AAGCTCAG CTGTG CTC CTAG CCC CTTCACGTTCTCTCGTTACG CCATGTCTTGCGTCCGT 
K.LS CAA5GFTFSRYAMSWVR 
Espl Hhml PflMX 

130 140 150 160 170 180 

CAGACTCCGGAGAAG CGTCTAGAGTGGGTCCCGAC6ATA 1 C I 1 L 1 U GTGGTTCTCACACC 
QTPEKRLEWVATXSSGGSHT 

aspMll xb*l Nrui EcoRV 

190 200 210 220 230 240 

TTCCATCCAGACACTGTGAACGGTCGATTCACGATCTCTCGAGACAACGCTAACAACACC 
FHPDSVXCRFTISRDNAKNT 

Xhol 

250 260 270 2B0 290 300 

TTGTACCTC CAAATGT CTTCTCTACGTAGTGAAG ATACTC CTATCTACTACTCTC CACCT 
LYLQM SSLR SEOTAMV Y C A R 
BspMI* SnaBX ApaLI 

310 320 330 340 350 360 

CCTCCACTGATCTCACTAGTTCCTCATTATCCCATGGATTATTGCCCTCATGGTGCTAGC 
PPLISLVADYAMDYWCHGAS 
Sp«I Ncol Mh«I 

370 3B0 . 390 400 410 420 

GTTACTGTGAGCTCTGGTGGCGGTGCGTCGCGCCGTGCTGCCTCGCCTCGCGGCGGATCG 
V T V S SGGGGSGGGGSGCGC5 
Sad 

430 440 450 460 470 480 

GATATCCTTATGACTCACTCTCATAAGTTCATCTCCACTTCTGTTGGTGACCCTCTrTCT 

DIVMTQSHKFMSTSVG DRVS 
EcoRV 8*tEII 

490 500 510 520 530 540 

ATCACTTGTAAGGCCAGCCAGGATGTGGGTGCTGCTATCGCATCCTATCACCACAAGCCC 
ITCKASQDVCAAXAUYQQKP 
PflMI Sma 

550 560 570 5B0 590 600 

CCCCACTCTCCTAACCTCCTGATCTACTCCCCCTCGACTCGTCATACTCGTGTCCCGGAT 

GQSPKLLIYWASTRHTCVPD 
I Sfell 




610 620 630 640 650 660 

CGTTTCACTCCGTCCCGATCAGGTACTWTITCACTCTCACTATTTCGJUiCGTTCAGTCT 
RFTGSGSGTDFTLTISNVQS 
BspHII ASUII 

670 680 690 700 710 720 

CATCACCTCGCTGATTACTTCTCCCAGCAATATTCCGGGTACCCTCTCACTTTCCCTCCC 
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